CN108962253A

Movatterモバイル変換

Info

Publication number: CN108962253A
Application number: CN201710384412.3A
Authority: CN
Inventors: 李明修; 银磊; 卜海亮
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2018-12-07
Also published as: WO2018214663A1

Abstract

The embodiment of the present invention provides a kind of voice-based data processing method, device and electronic equipment, completely to record interrogation process.The method includes: to obtain interrogation process data, and the interrogation process data is determined according to the voice data acquired during interrogation；It is identified according to the interrogation process data, obtains corresponding first text data and the second text data, wherein first text data belongs to a target user, and second text data belongs to the other users in addition to the target user；According to first text data and the second text data, interrogation information is obtained.Using the embodiment of the present invention, can during automatic distinguishing interrogation doctor, patient sentence, complete to record interrogation process, automatic arranging obtains the contents such as case, saves the finishing time of interrogation record.

Description

A kind of voice-based data processing method, device and electronic equipment

Technical field

The present invention relates to technical fields, set more particularly to a kind of voice-based data processing method, device and electronicsIt is standby.

Background technique

Speech recognition is usually to convert speech into text, and traditional speech recognition equipments of recording can only turn voice dataIt is changed to corresponding text, and speaker cannot be distinguished.It therefore, can not be effective by speech recognition in the case where multi-person speechIt is recorded.

Such as in the practical diagnosis and treatment process of hospital, at least have two people and exchange, i.e., at least have doctor and patient intoRow exchange, is also possible that family numbers of patients etc. sometimes, and cannot achieve by existing voice identification facility and ask the voice of acquisitionIt examines and records corresponding voice producer and distinguish, can not comprehensively record entire interrogation process.

Summary of the invention

The embodiment of the present invention provides a kind of voice-based data processing method, completely to record interrogation process.

Correspondingly, the embodiment of the invention also provides a kind of voice-based data processing equipments, a kind of electronic equipment, oneKind readable storage medium storing program for executing, to guarantee the implementation and application of the above method.

To solve the above-mentioned problems, the embodiment of the invention discloses a kind of voice-based data processing methods, comprising: obtainsInterrogation process data is taken, the interrogation process data is determined according to the voice data acquired during interrogation；According to the interrogationProcess data is identified, obtains corresponding first text data and the second text data, wherein the first text data categoryIn a target user, second text data belongs to the other users in addition to the target user；According to described firstText data and the second text data, obtain interrogation information.

Optionally, the interrogation process data is voice data；It is described to be identified according to the interrogation process data, it obtainsTake corresponding first text data and the second text data, comprising: according to vocal print feature, the is isolated from the voice dataOne voice data and second speech data；Speech recognition is carried out to first voice data and second speech data respectively, is obtainedTake corresponding first text data and the second text data.

Optionally, described according to vocal print feature, the first voice data and the second voice are isolated from the voice dataData, comprising: the voice data is divided into multiple sound bites；According to vocal print feature, determined using the sound biteFirst voice data and second speech data.

Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound biteAccording to, comprising: each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is targetThe vocal print feature of user；The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained；It obtainsThe sound bite not being consistent with the benchmark vocal print feature is taken, corresponding second speech data is obtained.

Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound biteAccording to, comprising: the vocal print feature of each sound bite is identified；Count the quantity that each vocal print feature corresponds to sound bite；It determinesThe maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print featureAccording to；Second speech data is generated using the sound bite for being not belonging to first voice data.

Optionally, described that speech recognition is carried out respectively to first voice data and second speech data, it obtains and corresponds toThe first text data and the second text data, comprising: voice is carried out respectively to each sound bite in first voice dataIdentification generates the first text data using the text fragments that identification obtains；To each sound bite in the second speech data pointNot carry out speech recognition, generate the second text data using the obtained text fragments of identification；Then, described according to first textData and the second text data, obtain interrogation information, comprising: according to each text fragments in first text data and describedEach text fragments respectively correspond the time sequencing of sound bite in two text datas, are ranked up, are asked to each text fragmentsExamine information.

Optionally, the interrogation process data is the text identification result that voice data identifies；Described in the foundationInterrogation process data is identified, obtains corresponding first text data and the second text data, comprising: to the text identificationAs a result feature identification is carried out, isolates the first text data and the second text data according to language feature.

Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language featureWith the second text data, comprising: divided to the text identification result, obtain corresponding text fragments；Using default mouldType identifies the text fragments, determines the language feature that the text fragments have, the language feature includes targetUser language feature and non-targeted user language feature；The first text is generated using the text fragments with target user's language featureNotebook data, and, the second text data is generated using the text fragments with non-targeted user language feature.

The embodiment of the invention also discloses a kind of voice-based data processing equipments, comprising: data acquisition module is used forInterrogation process data is obtained, the interrogation process data is determined according to the voice data acquired during interrogation；Text identification mouldBlock, for being identified according to the interrogation process data, corresponding first text data of acquisition and the second text data,In, first text data belongs to a target user, and second text data belongs in addition to the target userOther users；Information determination module, for obtaining interrogation information according to first text data and the second text data.

Optionally, the interrogation process data is voice data；The text identification module, comprising: separation submodule is usedAccording to vocal print feature, the first voice data and second speech data are isolated from the voice data；Speech recognition submoduleBlock obtains corresponding first textual data for carrying out speech recognition respectively to first voice data and second speech dataAccording to the second text data.

Optionally, the separation submodule, for the voice data to be divided into multiple sound bites；It is special according to vocal printSign, determines the first voice data and second speech data using the sound bite.

Optionally, the separation submodule, for being matched respectively using benchmark vocal print feature to each sound bite,In, the benchmark vocal print feature is the vocal print feature of target user；The audio fragment being consistent with the benchmark vocal print feature is obtained,Obtain corresponding first voice data；The audio fragment not being consistent with the benchmark vocal print feature is obtained, obtains corresponding secondVoice data.

Optionally, the separation submodule, identifies for the vocal print feature to each sound bite；Statistics has respectivelyThe sound bite and its quantity of identical vocal print feature generate second speech data using quantity maximum sound bite, wherein quantityMaximum vocal print feature is the vocal print feature of target user；Second speech data is generated using remaining sound bite.

Optionally, the speech recognition submodule, for being carried out respectively to each sound bite in first voice dataSpeech recognition generates the first text data using the text fragments that identification obtains；To each voice sheet in the second speech dataSection carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains；The information determination module is usedEach text fragments respectively correspond voice in each text fragments and second text data according to first text dataThe time sequencing of segment is ranked up each text fragments, obtains interrogation information.

Optionally, the interrogation process data is the text identification result that voice data identifies；The text identificationModule isolates the first text data and second according to language feature for carrying out feature identification to the text identification resultText data.

Optionally, the text identification module, comprising: segment changes molecular modules, for the text identification result intoRow divides, and obtains corresponding text fragments；Segment identifies submodule, for being known using preset model to the text fragmentsNot, determine that the language feature that the text fragments have, the language feature include first language feature and second language feature；Text generation submodule for use there are the text fragments of first language feature to generate the first text data, and, using toolThere are the text fragments of second language feature to generate the second text data.

The embodiment of the invention also discloses a kind of readable storage medium storing program for executing, when the instruction in the storage medium is by electronic equipmentProcessor execute when so that electronic equipment be able to carry out it is voice-based as described in one or more in the embodiment of the present inventionData processing method.

Optionally, a kind of electronic equipment, includes memory and one or more than one program, one of themPerhaps more than one program is stored in memory and is configured to be executed by one or more than one processor oneOr more than one program includes the instruction for performing the following operation: obtaining interrogation process data, the interrogation process dataIt is determined according to the voice data acquired during interrogation；It is identified according to the interrogation process data, obtains corresponding firstText data and the second text data, wherein first text data belongs to a target user, second text dataBelong to the other users in addition to the target user；According to first text data and the second text data, interrogation is obtainedInformation.

The embodiment of the present invention includes following advantages:

The interrogation process data that the embodiment of the present invention can be determined during interrogation by acquisition voice, can be from interrogationNumber of passes identifies the first text data and the second text data according to different user in, wherein the first text data categoryIn a target user, second text data belongs to the other users in addition to the target user, can automatic areaThe sentence of doctor, patient during point interrogation, then according to first text data and the second text data, obtain interrogation letterBreath can completely record interrogation process, and automatic arranging obtains the contents such as case, save the finishing time of interrogation record.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of voice-based data processing method embodiment of the invention；

Fig. 2 is the step flow chart of the voice-based data processing method embodiment of another kind of the invention；

Fig. 3 is the step flow chart of another voice-based data processing method embodiment of the invention；

Fig. 4 is a kind of structural block diagram of voice-based data processing equipment embodiment of the invention；

Fig. 5 is the structural block diagram of the voice-based data processing equipment embodiment of another kind of the invention；

Fig. 6 is that a kind of present invention electronics for voice-based data processing shown according to an exemplary embodiment is setStandby structural block diagram；

Fig. 7 is a kind of electronic equipment for voice-based data processing that the present invention is shown according to another exemplary embodimentStructural schematic diagram.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific realApplying mode, the present invention is described in further detail.

Referring to Fig.1, a kind of step flow chart of voice-based data processing method embodiment of the invention is shown, is hadBody may include steps of:

Step 102, interrogation process data is obtained, the interrogation process data is according to the voice data acquired during interrogationIt determines.

During interrogation, voice collecting, the language based on acquisition can be carried out to the interrogation process by various electronic equipmentsSound data obtain interrogation process data, i.e. the interrogation process data can be the voice data of acquisition, also may be based on the language of acquisitionThe text identification result that sound data conversion obtains.To the embodiment of the present invention can using various interrogation processes acquisition data intoRow identification.

Step 104, it is identified according to the interrogation process data, obtains corresponding first text data and the second textData, wherein first text data belongs to a target user, and second text data belongs to except the target userExcept other users.

The interrogation process data can be identified, the difference according to data type uses different recognition methods, such asVoice data can be handled by modes such as vocal print feature, speech recognitions, text data can be identified by text feature,To obtain the first text data and the second text data distinguished according to user.Wherein, can have at least during the interrogationTwo users carry out communication interaction, and a user is doctor, and other users are patient, family numbers of patients etc..E.g. according to doctorThe acquisition of outpatient service in one day, then it wherein will include a doctor and several patients, it is also possible to have one or several family numbers of patients.ThereforeCan be by doctor's behaviours target user for interrogation record, then the first text data is the corresponding interrogation text data of doctor, andUsing the text data of at least one other users as the second text data, the i.e. corresponding interrogation text data of patient and family members.

Step 106, according to first text data and the second text data, interrogation information is obtained.

Since interrogation is usually the process of question and answer, above-mentioned first text data and the second text data can be and pass throughWhat multiple text fragments were constituted, therefore the time based on text fragments interrogation information can be obtained with corresponding user.

Such as a kind of example of interrogation information is as follows:

2017-4-23 10:23AM

What symptom do doctor A: you have?

Patient B: my XXX is uncomfortable.

Doctor A: either with or without XXX?

Patient B: have.

……

In actual treatment, it may also be combined with outpatient service record of hospital etc. and obtain patient information, to be distinguished in interrogation informationDifferent patient etc. out.

In conclusion for the interrogation process data determined during interrogation by acquisition voice, it can be from interrogation processThe first text data and the second text data are identified according to different user in data, wherein first text data belongs toOne target user, second text data belong to the other users in addition to the target user, can automatic distinguishingThe sentence of doctor, patient during interrogation, then according to first text data and the second text data, interrogation information is obtained,Interrogation process can be completely recorded, automatic arranging obtains the contents such as case, saves the finishing time of interrogation record.

In the embodiment of the present invention, interrogation process data includes the text knowledge that voice data and/or voice data identifyOther result.The recognition methods of different types of interrogation process data is different, therefore the embodiment of the present invention discusses different type respectivelyThe treatment process of interrogation process data.

Referring to Fig. 2, the step flow chart of the voice-based data processing method embodiment of another kind of the invention is shown,In the embodiment, the interrogation process data is voice data；It can specifically include following steps:

Step 202, interrogation process data is obtained, the interrogation process data is the voice data of acquisition during interrogation.

During interrogation, the acquisition of voice data can be carried out to the interrogation process by various electronic equipments, such as logicalThe equipment recording audio data such as recording pen, mobile phone, computer are crossed, the voice data acquired during interrogation, the voice number are obtainedIt can also be the voice data that a doctor acquires in multiple outpatient service, the present invention according to the voice data that can be primary outpatient service acquisitionEmbodiment to this with no restriction.It therefore include the voice data of a doctor and the language of at least one patient in the voice dataSound data may also include the voice data of at least one family numbers of patients.

Wherein, above-mentioned steps 104 are identified according to the interrogation process data, obtain corresponding first text data andSecond text data, it may include following steps 204-206.

Step 204, according to vocal print feature, the first voice data and the second voice number are isolated from the voice dataAccording to.

Vocal print (Voiceprint) refers to the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown.Vocal print toolThere is the feature of specificity and stability.After adult, the vocal print of people can keep stablizing relatively for a long time constant, therefore can pass through vocal printIdentify different people.Therefore, it for voice data, can be identified by vocal print feature, determine different user in the voice data(vocal print feature) corresponding sound bite, to obtain the first voice data of target user and the second voice number of other usersAccording to.

Wherein, described according to vocal print feature, the first voice data and the second voice number are isolated from the voice dataAccording to, comprising: the voice data is divided into multiple sound bites；According to vocal print feature, the is determined using the sound biteOne voice data and second speech data.

Specifically, voice data can be divided into multiple sound bites.It wherein, can be according to voice division rule, such as soundDwell interval between segment is divided；The corresponding vocal print feature of each sound can also be determined, thus foundation according to vocal print featureDifferent vocal print features divides sound bite.Therefore a voice data can mark off multiple sound bites, between each sound biteWith tandem, different sound bites can have identical or different vocal print feature.Therefore also true based on vocal print featureFixed each sound bite belongs to the first voice data or second speech data, that is, can determine that sound possessed by each sound biteThen multiple sound bites of vocal print feature with target user are constituted the first voice data by line feature, by other residuesSound bite constitute second speech data.

In the embodiment of the present invention, during to interrogation before the acquisition of voice data, doctor (target user) can first be acquiredOne section of voice is as reference data, in order to identify the vocal print feature i.e. benchmark vocal print feature of doctor from the reference data.Speech recognition modeling can also be set in the embodiment of the present invention, after voice data is inputted the speech recognition modeling, can will be metThe sound bite of benchmark voice print database is separated with the sound bite of other vocal print features, to obtain each voice sheet of target userThe sound bite of section and other users.In doctor's outpatient procedures, a doctor is usually only included in the case information of composition, and is suffered fromPerson may have it is multiple, so that its corresponding a large amount of case sample can be obtained for some particular doctor through the above way.

In the embodiment of the present invention, since voice data is acquired in the scene of multi-conference, a voice sheetIt may include the vocal print feature of multiple users in section.The case where for identifying multiple vocal print features from a sound bite:It, can be by the language if vocal print feature is the vocal print feature of other users when different vocal print features are that occur in different timeTablet section is added in second speech data；And if vocal print feature includes the vocal print feature of target user and the vocal print of other usersFeature can will then be added in corresponding voice data after the subdivided sub-piece of the sound bite.When different vocal print features beWhat the same time occurred, i.e., the same time has at least two users speaking, if then vocal print feature is the vocal print of other usersThe sound bite can be added in second speech data by feature, and if vocal print feature include target user vocal print feature andThe vocal print feature of other users can divide according to demand, such as the sound bite that the sound bite is classified as target user is comeTo the first voice data, which is perhaps classified as to the sound bite of other users come obtain second speech data orIt is added respectively in the voice data of two kinds of users.

Step 206, speech recognition carried out to first voice data and second speech data respectively, obtains corresponding theOne text data and the second text data.

After getting the first voice data and second speech data, two kinds of voice data can be identified respectively, fromAnd obtain the first text data of target user and the second text data of other users.

In one alternative embodiment, speech recognition is carried out to first voice data and second speech data respectively, is obtainedTake corresponding first text data and the second text data, comprising: to each sound bite in first voice data respectively intoRow speech recognition generates the first text data using the text fragments that identification obtains；To each voice in the second speech dataSegment carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains.The first voice can be passed throughIdentification of the data to each sound bite obtains the corresponding text data of the sound bite, thus the sequence according to sound biteThe first text data is constituted, the second text data also can be obtained using corresponding mode.Due to during interrogation the problem of doctorAnswer with patient is all sequential, therefore corresponding time sequencing is recorded when voice data is divided into sound bite,Obtained the first text data and the second text data is also to have ordinal relation, is convenient for subsequent accurate arrangement interrogation information.

Step 208, according to first text data and the second text data, interrogation information is obtained.

The time sequencing that sound bite is corresponded to according to the first text data and the second text data, can be by the first text dataIn each text fragments in each text fragments and the second text data, be ranked up according to corresponding sequence, such as time sequencing, thusCorresponding interrogation information is obtained, can record doctor in the interrogation information in the problems in interrogation and respective patient (family members)Answer and doctor the various information such as diagnosis, doctor's advice.

Step 210, the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result and diseaseDiagnosis is related.

After sorting out interrogation information, the embodiment of the present invention can also be analyzed interrogation information according to demand, obtain phaseThe analysis answered as a result, due to interrogation be it is relevant to medical diagnosis on disease, the analysis result is also related to medical diagnosis on disease, specifically according toIt is determined according to analysis demand.

For example, the common problem of doctor can be counted to every kind of disease, it is supplied to the less doctor's behaviours reference of experience；Interrogation information can be analyzed, develop Chinese medicine (doctor trained in Western medicine) artificial intelligence question answering system etc.；It can also be by counting, analyzingEtc. modes determine the corresponding symptom of every kind of disease, treatment method etc..

Referring to Fig. 3, the step flow chart of another voice-based data processing method embodiment of the invention is shown,In the present embodiment, the interrogation process data is the text identification that identifies of voice data as a result, can specifically include as followsStep:

Step 302, the text identification result that voice data identifies is obtained.

The voice data is that interrogation collects in the process, and the voice data collected is obtained by speech recognition conversionTo text recognition result, text recognition result can be directly acquired.

Wherein, above-mentioned steps 104 are identified according to the interrogation process data, obtain corresponding first text data andSecond text data, it may include following steps 304.

Step 304, feature identification is carried out to the text identification result, isolates the first text data according to language featureWith the second text data.

It, can not be directly as asking since unknown every section words are which people says for being identified as the data of textInformation is examined, therefore, if the embodiment of the present invention identifies different user from text identification result and arranges interrogation information.ItsIn, during interrogation, doctor would generally put question to symptom, and user can reply Symptoms, consultation of doctor break for corresponding disease,Inspection, drug of needs of required work etc., to can identify doctor and patient from text identification result based on these featuresSentence, and then isolate the first text data and the second text data.

I.e. the embodiment of the present invention can collect the text of doctor's interrogation and the text of patient's interrogation in advance, and for eachThe interrogation information analyzed is collected, to count language feature and patient and its family of doctor (i.e. target user)Belong to the language feature of (i.e. other users), and establish corresponding model, convenient for distinguishing the text of different user based on the language featureThis.Wherein, it can determine that the language feature of different user establishes preset model by modes such as machine learning, probability statistics.

Wherein, the embodiment of the present invention can obtain a large amount of separated case text as training data, separated doctorCase text is the interrogation information for identifying target user and other users, the positive information of text such as obtained in history according to identification.It canTo including doctor's content-data (the first text data of target user) and patient content's data (second of other usersText data) it is trained respectively, doctor's content model and patient content's model are obtained, both certain models can synthesize onePreset model may recognize that the sentence of doctor and the sentence of patient based on the preset model.

For example, it is the question sentence with symptom class vocabulary that doctor's content is generally mostly in the case information that interrogation obtains, such as youFeel how, have what symptom, it is where uncomfortable etc.；And patient content is generally mostly to be asked with Symptoms, epidemic disease classSentence, such as whether I catch a cold, and is XX disease etc.；It is the declarative sentence with symptom and drug that doctor's content is generally mostly, such asYou are viral influenza, you can have some XX medicine etc..To which the sentence content of doctor and the sentence content of patient all have ratioMore significant language feature, therefore doctor's content model and patient content's mould can be obtained according to the training of separated case informationType.

Feature identification is carried out to the text identification result, isolates the first text data and the second text according to language featureNotebook data, comprising: the text identification result is divided, corresponding text fragments are obtained；Using preset model to describedText fragments are identified, determine that the language feature that the text fragments have, the language feature include first language featureWith second language feature；The first text data is generated using the text fragments with first language feature, and, using having theThe text fragments of two language features generate the second text data.First text identification result can be divided, it can be according to ChineseSentence feature etc., is divided into sentence for text identification result, can also divide to obtain multiple text fragments according to other modes.Then willEach text fragments sequentially input preset model, are identified by preset model to text fragments, each so as to identifyLanguage feature possessed by text fragments.Certainly, which may be alternatively provided as based on the language feature identified, be this articleThis segment divides owning user.Wherein, using language this feature of target user as first language feature, by the language of other usersFeature is sayed as second language feature, then preset model can be used and determine that text fragments have first language feature or the second languageSay feature.Then the first text will can be generated with the text fragments of first language feature according to the stripe sequence of text fragmentsData, and, the second text data is generated using the text fragments with second language feature.

Step 306, according to first text data and the second text data, interrogation information is obtained.

Step 308, the interrogation information is analyzed, is analyzed accordingly as a result, the analysis result and diseaseDiagnosis is related.

Correspond to the sequence of sound bite according to the first text data and the second text data, it can will be in the first text data respectivelyEach text fragments in text fragments and the second text data, are ranked up according to corresponding sequence, to obtain corresponding interrogationInformation can record doctor in the interrogation information in the problems in interrogation and the answer of respective patient (family members), and doctorThe various information such as raw diagnosis, doctor's advice.

Habit, the demand of case are recorded for doctor, are based on above scheme, can be by way of recording, it will be with patient'sCommunication process is recorded, and the sentence of doctor and patient is then demultiplex out, and is distinguished and is arranged, is supplied in the form of a dialogDoctor's behaviours case can be effectively reduced the time of doctor's institute's telephone expenses in case arrangement.

It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the methodIt closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according toAccording to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also shouldKnow, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implementedNecessary to example.

Referring to Fig. 4, a kind of structural block diagram of voice-based data processing equipment embodiment of the invention is shown, specificallyMay include following module:

Data acquisition module 402, for obtaining interrogation process data, the interrogation process data is adopted in the process according to interrogationThe voice data of collection determines.

Text identification module 404 obtains corresponding first textual data for being identified according to the interrogation process dataAccording to the second text data, wherein first text data belongs to a target user, and second text data, which belongs to, to be removedOther users except the target user.

Information determination module 406, for obtaining interrogation information according to first text data and the second text data.

Wherein, at least two users can carry out communication interaction during the interrogation, a user is doctor, other useFamily is patient, family numbers of patients etc..E.g. according to doctor's outpatient service in one day acquisition, then it wherein will include a doctor and several troublePerson, it is also possible to have one or several family numbers of patients.Therefore can be by doctor's behaviours target user for interrogation record, then the first textData are the corresponding interrogation text data of doctor, and using the text data of at least one other users as the second textual dataAccording to the i.e. corresponding interrogation text data of patient and family members.Since interrogation is usually the process of question and answer, above-mentioned first textual dataIt is made up of according to can be with the second text data multiple text fragments, therefore can time based on text fragments and to applicationFamily obtains interrogation information.

Such as a kind of example of interrogation information is as follows:

What symptom do 2017-4-23 10:23AM doctor A: you have? patient B: my XXX is uncomfortable.Doctor A: either with or withoutXXX? patient B, has ...

In conclusion for passing through acquisition determining interrogation process data during interrogation, it can be from interrogation process dataAccording to different user identify the first text data and the second text data, wherein first text data belongs to oneTarget user, second text data belong to the other users in addition to the target user, can automatic distinguishing interrogationThe sentence of doctor, patient in the process, then according to first text data and the second text data, interrogation information is obtained, it canComplete record interrogation process, automatic arranging obtain the contents such as case, save the finishing time of interrogation record.

Referring to Fig. 5, a kind of structural block diagram of voice-based data processing equipment embodiment of the invention is shown, specificallyMay include following module:

Wherein, the interrogation process data includes the text identification result that voice data and/or voice data identify.

The interrogation process data is voice data；The text identification module 404 may include:

Separate submodule 40402, for according to vocal print feature, isolated from the voice data the first voice data andSecond speech data.

Speech recognition submodule 40404, for carrying out voice respectively to first voice data and second speech dataIdentification obtains corresponding first text data and the second text data.

Wherein, the separation submodule 40402, for the voice data to be divided into multiple sound bites；According to soundLine feature determines the first voice data and second speech data using the sound bite.

Preferably, the separation submodule 40402, for being carried out respectively to each sound bite using benchmark vocal print featureMatch, wherein the benchmark vocal print feature is the vocal print feature of target user；Obtain the voice being consistent with the benchmark vocal print featureSegment obtains corresponding first voice data；The sound bite not being consistent with the benchmark vocal print feature is obtained, is obtained correspondingSecond speech data.

Preferably, the separation submodule 40402, identifies for the vocal print feature to each sound bite；It unites respectivelyThe quantity that each vocal print feature corresponds to sound bite is counted, determining has the maximum vocal print feature of quantity of sound bite, using describedThe corresponding sound bite of vocal print feature generates the first voice data, wherein the maximum vocal print feature of quantity is the sound of target userLine feature；Second speech data is generated using the sound bite for being not belonging to the first voice data.

It may be the record data of a multiple outpatient service of doctor by interrogation process data based on the characteristic of interrogation process,Therefore, doctor often occupies more time and different patients and its family members' exchange interrogation, i.e. voice in this processThe voice quantity of doctor (target user) is most in data, therefore the amount field subhead of sound bite can be corresponded to according to different userUser and other users are marked, and obtain the first voice data and second speech data.

Preferably, the speech recognition submodule 40404, for distinguishing sound bite each in first voice dataSpeech recognition is carried out, generates the first text data using the text fragments that identification obtains；To each language in the second speech dataTablet section carries out speech recognition respectively, generates the second text data using the text fragments that identification obtains.Then the information determinesModule 406, for according to each text fragments in each text fragments in first text data and second text data pointThe time sequencing for not corresponding to sound bite is ranked up each text fragments, obtains interrogation information.

Preferably, the interrogation process data is the text identification result that voice data identifies；The text identificationModule 404 isolates the first text data and the according to language feature for carrying out feature identification to the text identification resultTwo text datas.

The text identification module 404, comprising:

Segment changes molecular modules 40406, for dividing to the text identification result, obtains corresponding text pieceSection.

Segment identifies submodule 40408, for identifying using preset model to the text fragments, determines the textThe language feature that this segment has, the language feature include first language feature and second language feature.

Wherein, the embodiment of the present invention can obtain a large amount of separated case text as training data, separated doctorCase text is the interrogation information for identifying target user and other users, the positive information of text such as obtained in history according to identification.It canTo including doctor's content-data (the first text data of target user) and patient content's data (second of other usersText data) it is trained respectively, doctor's content model and patient content's model are obtained, both certain models can synthesize onePreset model may recognize that the sentence of doctor and the sentence of patient based on the preset model.For example, the case information that interrogation obtainsIn, it is the question sentence with symptom class vocabulary that doctor's content is generally mostly, such as you feel how, have what symptom, where do not relaxClothes etc.；And it is the question sentence with Symptoms, epidemic disease class that patient content is generally mostly, such as whether I catch a cold, and is XX diseaseDeng；It is the declarative sentence with symptom and drug that doctor's content is generally mostly, such as you are viral influenza, you can have some XX medicine etc.Deng.To which, the sentence content of doctor and the sentence content of patient all have the significant language feature of comparison, therefore can be according to having dividedFrom case information training obtain doctor's content model and patient content's model.

Preferably, text generation submodule 40410, for generating first using the text fragments with first language featureText data, and, the second text data is generated using the text fragments with second language feature.

Preferably, the device further include: analysis module 408 obtains phase for analyzing the interrogation informationThe analysis answered is as a result, the analysis result is related to medical diagnosis on disease.

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simplePlace illustrates referring to the part of embodiment of the method.

Fig. 6 is a kind of electronic equipment 600 for voice-based data processing shown according to an exemplary embodimentStructural block diagram.For example, electronic equipment 600 can be mobile phone, computer, digital broadcasting terminal, messaging device, tripPlay console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc.；It is also possible to server device, such as servicesDevice.

Referring to Fig. 6, electronic equipment 600 may include following one or more components: processing component 602, memory 604,Power supply module 606, multimedia component 608, audio component 610, the interface 612 of input/output (I/O), sensor module 614,And communication component 616.

The integrated operation of the usual controlling electronic devices 600 of processing component 602, such as with display, call, data are logicalLetter, camera operation and record operate associated operation.Processing element 602 may include one or more processors 620 to holdRow instruction, to perform all or part of the steps of the methods described above.In addition, processing component 602 may include one or more mouldsBlock, convenient for the interaction between processing component 602 and other assemblies.For example, processing component 602 may include multi-media module, withFacilitate the interaction between multimedia component 608 and processing component 602.

Memory 604 is configured as storing various types of data to support the operation in equipment 600.These data are shownExample includes the instruction of any application or method for operating on electronic equipment 600, contact data, telephone directory numberAccording to, message, picture, video etc..Memory 604 can by any kind of volatibility or non-volatile memory device or theyCombination realize, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasableProgrammable read only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, quick flashingMemory, disk or CD.

Electric power assembly 604 provides electric power for the various assemblies of electronic equipment 600.Electric power assembly 604 may include power supply pipeReason system, one or more power supplys and other with for electronic equipment 600 generate, manage, and distribute the associated component of electric power.

Multimedia component 608 includes the screen of one output interface of offer between the electronic equipment 600 and user.In some embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch surfacePlate, screen may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touchesSensor is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or slidingThe boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,Multimedia component 608 includes a front camera and/or rear camera.When electronic equipment 600 is in operation mode, as clappedWhen taking the photograph mode or video mode, front camera and/or rear camera can receive external multi-medium data.It is each prepositionCamera and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 610 is configured as output and/or input audio signal.For example, audio component 610 includes a MikeWind (MIC), when electronic equipment 600 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphoneIt is configured as receiving external audio signal.The received audio signal can be further stored in memory 604 or via logicalBelieve that component 616 is sent.In some embodiments, audio component 610 further includes a loudspeaker, is used for output audio signal.

I/O interface 612 provides interface between processing component 402 and peripheral interface module, and above-mentioned peripheral interface module canTo be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lockDetermine button.

Sensor module 614 includes one or more sensors, for providing the state of various aspects for electronic equipment 600Assessment.For example, sensor module 614 can detecte the state that opens/closes of equipment 600, the relative positioning of component, such as instituteThe display and keypad that component is electronic equipment 600 are stated, sensor module 614 can also detect electronic equipment 600 or electronicsThe position change of 600 1 components of equipment, the existence or non-existence that user contacts with electronic equipment 600,600 orientation of electronic equipmentOr the temperature change of acceleration/deceleration and electronic equipment 600.Sensor module 614 may include proximity sensor, be configured toIt detects the presence of nearby objects without any physical contact.Sensor module 614 can also include optical sensor, such asCMOS or ccd image sensor, for being used in imaging applications.In some embodiments, which can be withIncluding acceleration transducer, gyro sensor, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 616 is configured to facilitate the communication of wired or wireless way between electronic equipment 600 and other equipment.Electronic equipment 400 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.Show at oneIn example property embodiment, communication component 614 receives broadcast singal or broadcast from external broadcasting management system via broadcast channelRelevant information.In one exemplary embodiment, the communication component 614 further includes near-field communication (NFC) module, short to promoteCheng Tongxin.For example, radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band can be based in NFC module(UWB) technology, bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, electronic equipment 600 can be by one or more application specific integrated circuit (ASIC), numberWord signal processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally providedIt such as include the memory 604 of instruction, above-metioned instruction can be executed by the processor 620 of electronic equipment 400 to complete the above method.ExampleSuch as, the non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, softDisk and optical data storage devices etc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium is by the processing of electronic equipmentWhen device executes, so that electronic equipment is able to carry out a kind of voice-based data processing method, which comprises obtain interrogationProcess data, the interrogation process data are determined according to the voice data acquired during interrogation；Number of passes is crossed according to the interrogationAccording to being identified, corresponding first text data and the second text data are obtained, wherein first text data belongs to oneTarget user, second text data belong to the other users in addition to the target user；According to first textual dataAccording to the second text data, obtain interrogation information.

Optionally, the interrogation process data includes the text identification knot that voice data and/or voice data identifyFruit.

Optionally, described according to vocal print feature, the first voice data and the second voice number are determined using the sound biteAccording to, comprising: the vocal print feature of each sound bite is identified；Count the quantity that each vocal print feature corresponds to sound bite；It determinesThe maximum vocal print feature of quantity with sound bite generates the first voice number using the corresponding sound bite of the vocal print featureAccording to；Second speech data is generated using the sound bite for being not belonging to the first voice data.

Optionally, speech recognition carried out to first voice data and second speech data respectively, obtains corresponding theOne text data and the second text data, comprising: speech recognition is carried out respectively to each sound bite in first voice data,The first text data is generated using the text fragments that identification obtains；Each sound bite in the second speech data is carried out respectivelySpeech recognition generates the second text data using the text fragments that identification obtains.

Optionally, feature identification is carried out to the text identification result, isolates the first text data according to language featureWith the second text data, comprising: divided to the text identification result, obtain corresponding text fragments；Using default mouldType identifies the text fragments, determines that the language feature that the text fragments have, the language feature include firstLanguage feature and second language feature；The first text data is generated using the text fragments with first language feature, and, it adoptsThe second text data is generated with the text fragments with second language feature.

Optionally, further includes: the interrogation information is analyzed, is analyzed accordingly as a result, the analysis resultIt is related to medical diagnosis on disease.

Fig. 7 is a kind of electronics for voice-based data processing that the present invention is shown according to another exemplary embodimentThe structural schematic diagram of equipment 700.The electronic equipment 700 can be server, which can produce because configuration or performance are differentRaw bigger difference, may include one or more central processing units (central processing units, CPU)722 (for example, one or more processors) and memory 732, one or more storage application programs 742 or data744 storage medium 730 (such as one or more mass memory units).Wherein, memory 732 and storage medium 730It can be of short duration storage or persistent storage.The program for being stored in storage medium 730 may include one or more module (figuresShow and do not mark), each module may include to the series of instructions operation in server.Further, central processing unit 722It can be set to communicate with storage medium 730, execute the series of instructions operation in storage medium 730 on the server.

Server can also include one or more power supplys 726, one or more wired or wireless networks connectMouthfuls 750, one or more input/output interfaces 758, one or more keyboards 756, and/or, one or one withUpper operating system 741, such as Windows ServerTM, MacOS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

In the exemplary embodiment, server is configured to by one or more than one central processing unit 722 executes oneA or more than one program includes the instruction for performing the following operation: obtaining interrogation process data, number of passes is crossed in the interrogationIt is determined according to according to the voice data acquired during interrogation；It is identified according to the interrogation process data, obtains corresponding theOne text data and the second text data, wherein first text data belongs to a target user, second textual dataAccording to the other users belonged in addition to the target user；According to first text data and the second text data, askedExamine information.

Optionally, server is by one or more than one processor 522 executes the one or more programsInclude the instruction for being also used to perform the following operation: the interrogation information is analyzed, is analyzed accordingly as a result, described pointIt is related to medical diagnosis on disease to analyse result.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are withThe difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, apparatus or calculateMachine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software andThe form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer canWith in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program codeThe form of the computer program product of implementation.

The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal device (system) and computer programThe flowchart and/or the block diagram of product describes.It should be understood that flowchart and/or the block diagram can be realized by computer program instructionsIn each flow and/or block and flowchart and/or the block diagram in process and/or box combination.It can provide theseComputer program instructions are set to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminalsStandby processor is to generate a machine, so that being held by the processor of computer or other programmable data processing terminal devicesCapable instruction generates for realizing in one or more flows of the flowchart and/or one or more blocks of the block diagramThe device of specified function.

These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminal devicesIn computer-readable memory operate in a specific manner, so that instruction stored in the computer readable memory generates packetThe manufacture of command device is included, which realizes in one side of one or more flows of the flowchart and/or block diagramThe function of being specified in frame or multiple boxes.

These computer program instructions can also be loaded into computer or other programmable data processing terminal devices, so thatSeries of operation steps are executed on computer or other programmable terminal equipments to generate computer implemented processing, thusThe instruction executed on computer or other programmable terminal equipments is provided for realizing in one or more flows of the flowchartAnd/or in one or more blocks of the block diagram specify function the step of.

Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows basesThis creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted asIncluding preferred embodiment and fall into all change and modification of range of embodiment of the invention.

Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to byOne entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operationBetween there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaningCovering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements not only wrapThose elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, articleOr the element that terminal device is intrinsic.In the absence of more restrictions, being wanted by what sentence "including a ..." limitedElement, it is not excluded that there is also other identical elements in process, method, article or the terminal device for including the element.

Above to a kind of corpus abstracting method provided by the present invention, a kind of corpus draw-out device and a kind of electronic equipment,It is described in detail, used herein a specific example illustrates the principle and implementation of the invention, the above realityThe explanation for applying example is merely used to help understand method and its core concept of the invention；Meanwhile for the general technology of this fieldPersonnel, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, in conclusion this theoryBright book content should not be construed as limiting the invention.

Claims

1. a kind of voice-based data processing method characterized by comprising

Interrogation process data is obtained, the interrogation process data is determined according to the voice data acquired during interrogation；

It is identified according to the interrogation process data, obtains corresponding first text data and the second text data, wherein instituteIt states the first text data and belongs to a target user, second text data belongs to other use in addition to the target userFamily；

According to first text data and the second text data, interrogation information is obtained.

2. the method according to claim 1, wherein the interrogation process data is voice data；

It is described to be identified according to the interrogation process data, obtain corresponding first text data and the second text data, packetIt includes:

According to vocal print feature, the first voice data and second speech data are isolated from the voice data；

Speech recognition carried out respectively to first voice data and second speech data, obtain corresponding first text data andSecond text data.

3. according to the method described in claim 2, it is characterized in that, the foundation vocal print feature, divides from the voice dataSeparate out the first voice data and second speech data, comprising:

The voice data is divided into multiple sound bites；

According to vocal print feature, the first voice data and second speech data are determined using the sound bite.

4. according to the method described in claim 3, it is characterized in that, the foundation vocal print feature, true using the sound biteFixed first voice data and second speech data, comprising:

Each sound bite is matched respectively using benchmark vocal print feature, wherein the benchmark vocal print feature is target userVocal print feature；

The sound bite being consistent with the benchmark vocal print feature is obtained, corresponding first voice data is obtained；

The sound bite not being consistent with the benchmark vocal print feature is obtained, corresponding second speech data is obtained.

5. according to the method described in claim 3, it is characterized in that, the foundation vocal print feature, true using the sound biteFixed first voice data and second speech data, comprising:

The vocal print feature of each sound bite is identified；

Count the quantity that each vocal print feature respectively corresponds sound bite；

It determines the maximum vocal print feature of quantity with sound bite, generates the using the corresponding sound bite of the vocal print featureOne voice data；

Second speech data is generated using the sound bite for being not belonging to first voice data.

6. according to the method described in claim 2, it is characterized in that, described to first voice data and second speech dataSpeech recognition is carried out respectively, obtains corresponding first text data and the second text data, comprising:

Speech recognition is carried out to each sound bite in first voice data respectively, is generated using the text fragments that identification obtainsFirst text data；

Speech recognition is carried out to each sound bite in the second speech data respectively, is generated using the text fragments that identification obtainsSecond text data；

Then, described according to first text data and the second text data, obtain interrogation information, comprising:

Language is respectively corresponded according to each text fragments in each text fragments in first text data and second text dataThe time sequencing of tablet section is ranked up each text fragments, obtains interrogation information.

7. the method according to claim 1, wherein the interrogation process data is what voice data identifiedText identification result；

Feature identification is carried out to the text identification result, isolates the first text data and the second textual data according to language featureAccording to.

8. the method according to the description of claim 7 is characterized in that carrying out feature identification, foundation to the text identification resultLanguage feature isolates the first text data and the second text data, comprising:

The text identification result is divided, corresponding text fragments are obtained；

The text fragments are identified using preset model, determine the language feature that the text fragments have, institute's predicateSay that feature includes target user's language feature and non-targeted user language feature；

The first text data is generated using the text fragments with target user's language feature, and, using with non-targeted useThe text fragments of family language feature generate the second text data.

9. a kind of voice-based data processing equipment characterized by comprising

Data acquisition module, for obtaining interrogation process data, the interrogation process data is according to the language acquired during interrogationSound data determine；

Text identification module obtains corresponding first text data and for being identified according to the interrogation process dataTwo text datas, wherein first text data belongs to a target user, and second text data belongs to except the meshMark the other users except user；

Information determination module, for obtaining interrogation information according to first text data and the second text data.

10. a kind of readable storage medium storing program for executing, which is characterized in that when the instruction in the storage medium is held by the processor of electronic equipmentWhen row, so that electronic equipment is able to carry out at the voice-based data as described in one or more in claim to a method 1-8Reason method.

11. a kind of electronic equipment, which is characterized in that include memory and one or more than one program, wherein oneA perhaps more than one program is stored in memory and is configured to execute described one by one or more than one processorA or more than one program includes the instruction for performing the following operation: