Movatterモバイル変換


[0]ホーム

URL:


CN105719659A - Recording file separation method and device based on voiceprint identification - Google Patents

Recording file separation method and device based on voiceprint identification
Download PDF

Info

Publication number
CN105719659A
CN105719659ACN201610077739.1ACN201610077739ACN105719659ACN 105719659 ACN105719659 ACN 105719659ACN 201610077739 ACN201610077739 ACN 201610077739ACN 105719659 ACN105719659 ACN 105719659A
Authority
CN
China
Prior art keywords
vocal print
audio signals
recorded audio
print characteristic
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610077739.1A
Other languages
Chinese (zh)
Inventor
廖娟娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co LtdfiledCriticalNubia Technology Co Ltd
Priority to CN201610077739.1ApriorityCriticalpatent/CN105719659A/en
Publication of CN105719659ApublicationCriticalpatent/CN105719659A/en
Pendinglegal-statusCriticalCurrent

Links

Classifications

Landscapes

Abstract

The embodiment of the invention discloses a recording file separation method based on voiceprint identification. The method comprises the steps of: extracting voiceprint characteristic data in recording signals; comparing the voiceprint characteristic data with preset voice models; and according to a comparison result, encoding recording signal units corresponding to the identical voiceprint characteristic data individually, and storing the recording signal units as individual voice files. The embodiment of the invention further provides a recording file separation device based on voiceprint identification. In this way, when the recorded voice data is in a large amount and is complex, the different voices are separated and individually stored, so that a user is enabled to hear clear recording content, and convenience is provided for the work and life of the user.

Description

Recording file separation method and device based on Application on Voiceprint Recognition
Technical field
The present invention relates to recording technology field, particularly relate to a kind of recording file separation method based on Application on Voiceprint Recognition and device.
Background technology
At present, the mobile terminals such as mobile phone are used to record, the voice data recorded is very many and complicated, for instance, for session recording, generally comprise the voice data of multiple meeting participant, when looking back conference content or carrying out summary of meeting work, it is necessary to listen to recording file, but owing to the people of conference participation is more, possibly cannot not hear recording substance, makes troubles to the Working Life of user.
Summary of the invention
In view of this, embodiment of the present invention expectation provides a kind of recording file separation method based on Application on Voiceprint Recognition and device, so that user can hear recording substance clearly, user friendly Working Life from recorded file.
Embodiments of the invention provide a kind of recording file segregation apparatus based on Application on Voiceprint Recognition on the one hand, and described device includes: voiceprint extraction module, contrast module and code storage module;
Wherein, described voiceprint extraction module, for extracting the vocal print characteristic in recorded audio signals;
Described contrast module, for contrasting described vocal print characteristic and default speech model;
Described code storage module, for the comparing result according to described contrast module, carries out separately encoded to the recorded audio signals monomer corresponding with identical vocal print characteristic, is stored as independent audio files.
Optionally, described voiceprint extraction module specifically for:
By wavelet transformation technique, extract following vocal print characteristic in described recorded audio signals: fundamental tone frequency spectrum and profile thereof, the energy of fundamental tone frame, the frequency of occurrences of fundamental tone formant and track, linear prediction cepstrum coefficient, line spectrum pair, auto-correlation and log area ratio, Mel frequency cepstral coefficient MFCC, perception linear prediction.
Optionally, described default speech model include following at least one: vector quantization model, stochastic model and neural network model.
Optionally, described code storage module specifically for:
Undertaken the recorded audio signals monomer corresponding with identical vocal print characteristic strengthening processing and amplifying;Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded.
Optionally, described device also includes noise reduction module, for the audio signal collected is carried out noise reduction process, obtains described recorded audio signals.
Embodiments of the invention provide a kind of recording file separation method based on Application on Voiceprint Recognition on the other hand, and described method includes:
Extract the vocal print characteristic in recorded audio signals;
Described vocal print characteristic and default speech model are contrasted;
According to comparing result, the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, is stored as independent audio files.
Optionally, the vocal print characteristic in described extraction recorded audio signals, including:
By wavelet transformation technique, extract following vocal print characteristic in described recorded audio signals: fundamental tone frequency spectrum and profile thereof, the energy of fundamental tone frame, the frequency of occurrences of fundamental tone formant and track, linear prediction cepstrum coefficient, line spectrum pair, auto-correlation and log area ratio, Mel frequency cepstral coefficient MFCC, perception linear prediction.
Optionally, described default speech model include following at least one: vector quantization model, stochastic model and neural network model.
Optionally, described the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, including:
Undertaken the recorded audio signals monomer corresponding with identical vocal print characteristic strengthening processing and amplifying;
Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded.
Optionally, before the vocal print characteristic in described extraction recorded audio signals, described method also includes:
The audio signal collected is carried out noise reduction process, obtains described recorded audio signals.
A kind of recording file separation method based on Application on Voiceprint Recognition that the embodiment of the present invention provides and device, the recording file segregation apparatus based on Application on Voiceprint Recognition extracts the vocal print characteristic in recorded audio signals;Described vocal print characteristic and default speech model are contrasted;According to comparing result, the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, is stored as independent audio files.So, record voice data many and complicated time, by different sound is separated and is individually stored so that user can hear recording substance clearly, user friendly Working Life.
Accompanying drawing explanation
Fig. 1 is the hardware architecture diagram of a kind of optional mobile terminal realizing the embodiment of the present invention;
The communication system architecture schematic diagram that the mobile terminal that Fig. 2 provides for the embodiment of the present invention is operable to;
The structural representation of the recording file segregation apparatus based on Application on Voiceprint Recognition that Fig. 3 provides for the embodiment of the present invention;
The flow chart of the recording file separation method based on Application on Voiceprint Recognition that Fig. 4 provides for the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.
The mobile terminal realizing each embodiment of the present invention is described referring now to accompanying drawing 1.In follow-up description, use the suffix being used for representing such as " module ", " parts " or " unit " of element only for being conducive to the explanation of the present invention, itself do not have specific meaning.Therefore, " module " and " parts " can mixedly use.
Mobile terminal can be implemented in a variety of manners.Such as, the terminal described in the present invention can include the mobile terminal of such as mobile phone, smart phone, notebook computer, digit broadcasting receiver, personal digital assistant (PDA), panel computer (PAD), portable media player (PMP), guider etc. and the fixed terminal of such as numeral TV, desk computer etc..Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for the element of mobile purpose, structure according to the embodiment of the present invention can also apply to the terminal of fixed type.
Fig. 1 is the hardware architecture diagram realizing a kind of optional mobile terminal of each embodiment of the present invention.
Mobile terminal 100 can include wireless communication unit 110, audio/video (A/V) input block 120, user input unit 130, output unit 150, memorizer 160, interface unit 170, controller 180 and power subsystem 190 etc..Fig. 1 illustrates the mobile terminal with various assembly, it should be understood that be not required for implementing all assemblies illustrated, it is possible to alternatively implement more or less of assembly, will be discussed in more detail below the element of mobile terminal.
Wireless communication unit 110 generally includes one or more assembly, and it allows the radio communication between mobile terminal 100 and wireless communication system or network.Such as, wireless communication unit can include at least one in mobile communication module 112, wireless Internet module 113, short range communication module 114 and positional information module 115.
Mobile communication module 112 sends radio signals at least one in base station (such as, access point, node B etc.), exterior terminal and server and/or receives from it radio signal.Such radio signal can include voice call signal, video calling signal or the various types of data sending according to text and/or Multimedia Message and/or receiving.
Wireless Internet module 113 supports the Wi-Fi (Wireless Internet Access) of mobile terminal.This module can internally or externally be couple to terminal.Wi-Fi (Wireless Internet Access) technology involved by this module can include WLAN (WLAN) (Wi-Fi), Wibro (WiMAX), Wimax (worldwide interoperability for microwave access), HSDPA (high-speed downlink packet access) etc..
Short range communication module 114 is the module for supporting junction service.Some examples of short-range communication technology include bluetooth TM, RF identification (RFID), Infrared Data Association (IrDA), ultra broadband (UWB), purple honeybee TM etc..
Positional information module 115 is the module of positional information for checking or obtain mobile terminal.The typical case of positional information module 115 is GPS (global positioning system).According to current technology, GPS module 115 calculates from the range information of three or more satellites and correct time information and the Information application triangulation for calculating, thus according to longitude, latitude with highly accurately calculate three-dimensional current location information.Currently, the method for calculating position and temporal information uses three satellites and the error by the position using an other satellite correction to calculate and temporal information.Additionally, GPS module 115 can calculate velocity information by Continuous plus current location information in real time.
A/V input block 120 is used for receiving audio or video signal.A/V input block 120 can include camera 121 and mike 122, and the view data of the camera 121 static images to being obtained by image capture apparatus in Video Capture pattern or image capture mode or video processes.Picture frame after process may be displayed on display unit 151.Picture frame after camera 121 processes can be stored in memorizer 160 (or other storage medium) or be transmitted via wireless communication unit 110, it is possible to provide two or more cameras 121 according to the structure of mobile terminal.Mike 122 can receive sound (voice data) via mike 122 in telephone calling model, logging mode, speech recognition mode etc. operational mode, and can be voice data by such acoustic processing.Audio frequency (voice) data after process can be converted to the form output that can be sent to mobile communication base station via mobile communication module 112 when telephone calling model.Mike 122 can implement various types of noise elimination (or suppression) algorithm to eliminate (or suppression) in the noise received and produce in the process of transmission audio signal or interference.
User input unit 130 can generate key input data to control the various operations of mobile terminal according to the order of user's input.User input unit 130 allows user to input various types of information, and can include keyboard, touch pad (such as, detection due to touched and cause resistance, pressure, electric capacity etc. the sensitive component of change) etc..Especially, when touch pad is superimposed upon on display unit 151 as a layer, it is possible to form touch screen.
Interface unit 170 is used as at least one external device (ED) and is connected, with mobile terminal 100, the interface that can pass through.Such as, external device (ED) can include wired or wireless head-band earphone port, external power source (or battery charger) port, wired or wireless FPDP, memory card port, for connecting the port of the device with identification module, audio frequency input/output (I/O) port, video i/o port, ear port etc..Interface unit 170 may be used for receiving from the input (such as, data message, electric power etc.) of external device (ED) and the one or more elements being transferred in mobile terminal 100 by the input received or may be used for transmission data between mobile terminal and external device (ED).
Output unit 150 can include display module 151 and dio Output Modules 152 etc..
Display unit 151 may be displayed on the information processed in mobile terminal 100.Such as, when mobile terminal 100 is in telephone calling model, display unit 151 can show the user interface (UI) relevant with call or other communicate (such as, text messaging, multimedia file download etc.) or graphic user interface (GUI).When being in video calling pattern or image capture mode when mobile terminal 100, display unit 151 can show the image of image and/or the reception caught, UI or GUI illustrating video or image and correlation function etc..
Meanwhile, when display module 151 and touch pad as a layer superposed on one another to form touch screen time, display module 151 can serve as input equipment and output device.Display module 151 can include at least one in liquid crystal display (LCD), thin film transistor (TFT) LCD (TFT-LCD), Organic Light Emitting Diode (OLED) display, flexible display, three-dimensional (3D) display etc..Some in these display may be constructed such that transparence is to allow user to watch from outside, and this is properly termed as transparent display, and typical transparent display can be such as TOLED (transparent organic light emitting diode) display etc..According to the specific embodiment wanted, mobile terminal 100 can include two or more display units (or other display device), such as, mobile terminal can include outernal display unit (not shown) and inner display unit (not shown).Touch screen can be used for detecting touch input pressure and touch input position and touch input area.
Dio Output Modules 152 can mobile terminal be in call signal receive under the isotype such as pattern, call mode, logging mode, speech recognition mode, broadcast reception mode time, that wireless communication unit 110 is received or storage in memorizer 160 voice data transducing audio signal and be output as sound.And, dio Output Modules 152 can provide the audio frequency output (such as, call signal receive sound, message sink sound etc.) relevant to the specific function of mobile terminal 100 execution.Dio Output Modules 152 can include speaker, buzzer etc..
Memorizer 160 can store the process performed by controller 180 and control software program etc. of operation, or can temporarily store and export the data that maybe will export.And, memorizer 160 can store the vibration about the various modes exported when touching and being applied to touch screen and the data of audio signal.
Memorizer 160 can include the storage medium of at least one type, described storage medium includes flash memory, hard disk, multimedia card, card-type memorizer (such as, SD or DX memorizer etc.), random access storage device (RAM), static random-access memory (SRAM), read only memory (ROM), Electrically Erasable Read Only Memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..And, mobile terminal 100 can be connected the network storage device cooperation of the storage function performing memorizer 160 with by network.
Controller 180 generally controls the overall operation of mobile terminal.Such as, controller 180 performs the control relevant to voice call, data communication, video calling etc. and process.It addition, controller 180 can include the multi-media module 181 for reproducing (or playback) multi-medium data, multi-media module 181 can construct in controller 180, or it is so structured that separates with controller 180.Controller 180 can perform pattern recognition process, so that the handwriting input performed on the touchscreen or picture drafting input are identified as character or image.
Power subsystem 190 receives external power or internal power under the control of controller 180 and provides the suitable electric power operated needed for each element and assembly.
Various embodiment described herein can to use such as computer software, hardware or its any combination of computer-readable medium to implement.Hardware is implemented, embodiment described herein can pass through to use application-specific IC (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), processor, controller, microcontroller, microprocessor, at least one that is designed to perform in the electronic unit of function described herein to implement, in some cases, such embodiment can be implemented in controller 180.Implementing for software, the embodiment of such as process or function can be implemented with allowing the independent software module performing at least one function or operation.Software code can be implemented by the software application (or program) write with any suitable programming language, and software code can be stored in memorizer 160 and be performed by controller 180.
So far, mobile terminal is described according to its function.Below, for the sake of brevity, by the slide type mobile terminal in the various types of mobile terminals describing such as folded form, board-type, oscillating-type, slide type mobile terminal etc. exemplarily.Therefore, the present invention can be applied to any kind of mobile terminal, and is not limited to slide type mobile terminal.
Mobile terminal 100 as shown in Figure 1 may be constructed such that utilization operates via such as wired and wireless communication system and the satellite-based communication system of frame or packet transmission data.
The communication system being wherein operable to according to the mobile terminal of the present invention is described referring now to Fig. 2.
Such communication system can use different air interfaces and/or physical layer.Such as, the air interface used by communication system includes such as frequency division multiple access (FDMA), time division multiple acess (TDMA), CDMA (CDMA) and UMTS (UMTS) (especially, Long Term Evolution (LTE)), global system for mobile communications (GSM) etc..As non-limiting example, as explained below relates to cdma communication system, but such instruction is equally applicable to other type of system.
With reference to Fig. 2, cdma wireless communication system can include multiple mobile terminal 100, multiple base station (BS) 270, base station controller (BSC) 275 and mobile switching centre (MSC) 280.MSC280 is configured to form interface with Public Switched Telephony Network (PSTN) 290.MSC280 is also structured to and the BSC275 formation interface that can be couple to base station 270 via back haul link.Back haul link can construct according to any one in some known interfaces, and described interface includes such as E1/T1, ATM, IP, PPP, frame relay, HDSL, ADSL or xDSL.It will be appreciated that system as shown in Figure 2 can include multiple BSC275.
Each BS270 can service one or more subregion (or region), by each subregion of multidirectional antenna or the antenna covering pointing to specific direction radially away from BS270.Or, each subregion can be covered by two or more antennas for diversity reception.Each BS270 may be constructed such that support multiple frequencies distribution, and the distribution of each frequency has specific frequency spectrum (such as, 1.25MHz, 5MHz etc.).
Intersecting that subregion and frequency are distributed can be referred to as CDMA Channel.BS270 can also be referred to as base station transceiver subsystem (BTS) or other equivalent terms.In this case, term " base station " may be used for broadly representing single BSC275 and at least one BS270.Base station can also be referred to as " cellular station ".Or, each subregion of specific BS270 can be referred to as multiple cellular station.
In fig. 2 it is shown that several global positioning systems (GPS) satellite 300.Satellite 300 helps to position at least one in multiple mobile terminals 100.
In fig. 2, depict multiple satellite 300, it is understood that be, it is possible to use any number of satellite obtains useful location information.GPS module 115 as shown in Figure 1 is generally configured to coordinate the location information wanted with acquisition with satellite 300.Substitute GPS tracking technique or outside GPS tracking technique, it is possible to use other technology of the position of mobile terminal can be followed the tracks of.It addition, at least one gps satellite 300 can optionally or additionally process satellite dmb transmission.
As a typical operation of wireless communication system, BS270 receives the reverse link signal from various mobile terminals 100.Mobile terminal 100 generally participates in call, information receiving and transmitting communicates with other type of.Each reverse link signal that certain base station 270 receives is processed in specific BS270.The data obtained are forwarded to relevant BSC275.BSC provides call resource distribution and the mobile management function of the coordination of soft switching process included between BS270.The data received also are routed to MSC280 by BSC275, and it provides the extra route service for forming interface with PSTN290.Similarly, PSTN290 and MSC280 forms interface, and MSC280 and BSC275 forms interface, and BSC275 correspondingly controls BS270 so that forward link signals to be sent to mobile terminal 100.
Based on above-mentioned mobile terminal hardware configuration and communication system, it is proposed to each embodiment of the inventive method.
The structural representation of the recording file segregation apparatus based on Application on Voiceprint Recognition that Fig. 3 provides for the embodiment of the present invention.As it is shown on figure 3, the device that the present embodiment provides may include that voiceprint extraction module 31, contrast module 32 and code storage module 33.
Wherein, described voiceprint extraction module 31, for extracting the vocal print characteristic in recorded audio signals;
Described contrast module 32, for contrasting described vocal print characteristic and default speech model;
Described code storage module 33, for the comparing result according to described contrast module 32, carries out separately encoded to the recorded audio signals monomer corresponding with identical vocal print characteristic, is stored as independent audio files.
Firstly the need of illustrating, so-called vocal print (Voiceprint), is the sound wave spectrum carrying verbal information that shows of electricity consumption acoustic instrument.The generation of human language is a complicated physiology physical process between Body Languages maincenter and phonatory organ, the phonatory organ that people uses when speech--tongue, tooth, larynx, lung, nasal cavity everyone widely different in size and form, so the vocal print collection of illustrative plates of any two people is all variant.
Everyone existing relative stability of Speech acoustics feature, has again variability, is not absolute, unalterable.This variation may be from physiology, pathology, psychology, simulation, camouflage, also relevant with environmental disturbances.While it is true, owing to everyone phonatory organ are not quite similar, therefore in the ordinary course of things, people remain to the sound of the different people of difference or judge whether it is the sound of same people.
Further, vocal print is characterized by the acoustic features that the anatomical structure of the pronunciation mechanism with the mankind is relevant, such as frequency spectrum, cepstrum, formant, fundamental tone, reflection coefficient etc., rhinophonia, band deep breathing sound, hoarse sound, laugh etc.;The vocal print feature of the mankind is by the impact of socioeconomics, education level, birthplace, semanteme, rhetoric, pronunciation, speech custom etc..
For vocal print feature, the features such as personal touch or the rhythm affected by father and mother, rhythm, speed, intonation, volume, can from the angle utilizing mathematical method to model, the current operable feature of the automatic model of cognition of vocal print includes: acoustic features, such as cepstrum;Lexical characteristics, such as the word n-gram that speaker is correlated with, phoneme n-gram etc.;Prosodic features, as utilized the n-gram fundamental tone described and energy " posture ";Languages, dialect and accent information;Which kind of passage channel information, as used.
In practical application, when user is when recording, described voiceprint extraction module 31 is specifically for passing through wavelet transformation technique, extract following vocal print characteristic in described recorded audio signals: fundamental tone frequency spectrum and profile thereof, the energy of fundamental tone frame, the frequency of occurrences of fundamental tone formant and track, linear prediction cepstrum coefficient, line spectrum pair, auto-correlation and log area ratio, Mel frequency cepstral coefficient (MelFrequencyCepstrumCoefficient, MFCC), perception linear prediction.
In the present embodiment, described default speech model include following at least one: vector quantization model, stochastic model and neural network model;Saved vocal print characteristic and described default speech model are contrasted by described contrast module 32, namely using the vocal print characteristic mated with described default speech model in saved vocal print characteristic as identical vocal print characteristic, main running signal within the scope of preset signals can be retained, all the other signals of automatic fitration, thus obtaining comparatively complete all main body vocal print signals.
Described code storage module 33 specifically for: the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out strengthen processing and amplifying;Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded;That is, all main body vocal print signals that described code storage module 33 is complete are separately separated out, and through strengthening processing and amplifying, obtain all vocal print signal monomers separated, described all vocal print signal monomers are corresponding to identical vocal print characteristic, described all vocal print signal Monomers code, it is reduced into independent acoustical signal, and is separately stored as audio files.
It follows that in the present embodiment, it is possible to isolate at least one can individually store, audio files clearly from a recording file, it is simple to user hears recording substance clearly.
Further, the device that the present embodiment provides can also include noise reduction module, for when user is when recording, the audio signal collected being carried out noise reduction process, obtains described recorded audio signals.
In practical application, the recording file segregation apparatus based on Application on Voiceprint Recognition that application the present embodiment provides, when carrying out recording file and separating, it is possible to be divided into two stages.
First stage, is voiceprint extraction separation phase, specifically includes vocal print feature extraction, vocal print comparison and differentiates decision-making;Wherein, vocal print feature extraction, is extracted all separable vocal print characteristics in recording file by described voiceprint extraction module 31;Vocal print comparison, is the vocal print characteristic separated contrasted, and verification, finer vocal print separates;Differentiate decision-making, be that the main running signal within the scope of preset signals is retained, all the other signal automatic fitrations, obtain comparatively complete all main body vocal print signals.
Second stage, is that vocal print reconfigures the stage, concrete, refers to the multiple main body vocal print signals preserved, separately encoded, is reduced into independent acoustical signal, and is separately stored as audio files.
The recording file segregation apparatus based on Application on Voiceprint Recognition that application the present embodiment provides, has the following advantage: the voice containing vocal print feature obtains convenient, natural, and voiceprint extraction can complete unconsciously, and therefore the acceptance level of user is also high;The identification obtaining voice is with low cost, uses simple, and a mike, when using communication apparatus more without extra sound pick-up outfit;It is suitable for remote identity to confirm, it is only necessary to a mike or phone, mobile phone just can pass through networking (communication network or internet) and realize Telnet;Vocal print identification and the algorithm complex confirmed are low;Coordinate some other measures, as carried out content discriminating etc. by speech recognition, it is possible to improve accuracy rate.
Such as, in the session discussing process that conference participation number is more, recording, by apply the present embodiment provide based on after the recording file segregation apparatus of Application on Voiceprint Recognition, it is possible to automatically play everyone viewpoint, or after meeting, when looking back conference content, namely playback file may know that full content, greatly facilitates summary of meeting work, meanwhile, the member not participating in meeting can also know all of conference content by recording file.
The present embodiment provide the recording file segregation apparatus based on Application on Voiceprint Recognition, record voice data many and complicated time, by different sound is separated and is individually stored so that user can hear recording substance clearly, user friendly Working Life.
In actual applications, described voiceprint extraction module 31, contrast module 32 and code storage module 33 all can by the central processing unit (CentralProcessingUnit in the recording file segregation apparatus being based on Application on Voiceprint Recognition, CPU), microprocessor (MicroProcessorUnit, MPU), digital signal processor (DigitalSignalProcessor, DSP) or field programmable gate array (FieldProgrammableGateArray, FPGA) etc. realize.
The flow chart of the recording file separation method based on Application on Voiceprint Recognition that Fig. 4 provides for the embodiment of the present invention.As shown in Figure 4, the method that the present embodiment provides specifically can be performed by the recording file segregation apparatus based on Application on Voiceprint Recognition, specifically can should be arranged in the mobile terminal such as mobile phone, panel computer based on the recording file segregation apparatus of Application on Voiceprint Recognition, concrete, the method that the present embodiment provides may include that
Step 401, the vocal print characteristic extracted in recorded audio signals.
Step 402, described vocal print characteristic and default speech model are contrasted.
Step 403, according to comparing result, the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, is stored as independent audio files.
Firstly the need of illustrating, so-called vocal print (Voiceprint), is the sound wave spectrum carrying verbal information that shows of electricity consumption acoustic instrument.The generation of human language is a complicated physiology physical process between Body Languages maincenter and phonatory organ, the phonatory organ that people uses when speech--tongue, tooth, larynx, lung, nasal cavity everyone widely different in size and form, so the vocal print collection of illustrative plates of any two people is all variant.
Everyone existing relative stability of Speech acoustics feature, has again variability, is not absolute, unalterable.This variation may be from physiology, pathology, psychology, simulation, camouflage, also relevant with environmental disturbances.While it is true, owing to everyone phonatory organ are not quite similar, therefore in the ordinary course of things, people remain to the sound of the different people of difference or judge whether it is the sound of same people.
Further, vocal print is characterized by the acoustic features that the anatomical structure of the pronunciation mechanism with the mankind is relevant, such as frequency spectrum, cepstrum, formant, fundamental tone, reflection coefficient etc., rhinophonia, band deep breathing sound, hoarse sound, laugh etc.;The vocal print feature of the mankind is by the impact of socioeconomics, education level, birthplace, semanteme, rhetoric, pronunciation, speech custom etc..
For vocal print feature, the features such as personal touch or the rhythm affected by father and mother, rhythm, speed, intonation, volume, can from the angle utilizing mathematical method to model, the current operable feature of the automatic model of cognition of vocal print includes: acoustic features, such as cepstrum;Lexical characteristics, such as the word n-gram that speaker is correlated with, phoneme n-gram etc.;Prosodic features, as utilized the n-gram fundamental tone described and energy " posture ";Languages, dialect and accent information;Which kind of passage channel information, as used.
Concrete, in above-mentioned steps 401, when user is when recording, wavelet transformation technique can be passed through, extract following vocal print characteristic in described recorded audio signals: fundamental tone frequency spectrum and profile thereof, the energy of fundamental tone frame, the frequency of occurrences of fundamental tone formant and track, linear prediction cepstrum coefficient, line spectrum pair, auto-correlation and log area ratio, MFCC, perception linear prediction.
In the present embodiment, described default speech model include following at least one: vector quantization model, stochastic model and neural network model;Saved vocal print characteristic and described default speech model are contrasted, namely using the vocal print characteristic mated with described default speech model in saved vocal print characteristic as identical vocal print characteristic, main running signal within the scope of preset signals can be retained, all the other signals of automatic fitration, thus obtaining comparatively complete all main body vocal print signals.
In above-mentioned steps 403, undertaken the recorded audio signals monomer corresponding with identical vocal print characteristic strengthening processing and amplifying;Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded;That is, complete all main body vocal print signals are separately separated out, and through strengthening processing and amplifying, obtain all vocal print signal monomers separated, described all vocal print signal monomers are corresponding to identical vocal print characteristic, described all vocal print signal Monomers code, it is reduced into independent acoustical signal, and is separately stored as audio files.
It follows that in the present embodiment, it is possible to isolate at least one can individually store, audio files clearly from a recording file, it is simple to user hears recording substance clearly.
Further, before above-mentioned steps 401, before the vocal print characteristic in extracting recorded audio signals, when user is when recording, in addition it is also necessary to the audio signal collected is carried out noise reduction process, obtains described recorded audio signals.
In practical application, the recording file segregation apparatus based on Application on Voiceprint Recognition that application the present embodiment provides, when carrying out recording file and separating, it is possible to be divided into two stages.
First stage, is voiceprint extraction separation phase, specifically includes vocal print feature extraction, vocal print comparison and differentiates decision-making;Wherein, vocal print feature extraction, is extract all separable vocal print characteristics in recording file;Vocal print comparison, is the vocal print characteristic separated contrasted, and verification, finer vocal print separates;Differentiate decision-making, be that the main running signal within the scope of preset signals is retained, all the other signal automatic fitrations, obtain comparatively complete all main body vocal print signals.
Second stage, is that vocal print reconfigures the stage, concrete, refers to the multiple main body vocal print signals preserved, separately encoded, is reduced into independent acoustical signal, and is separately stored as audio files.
The recording file segregation apparatus based on Application on Voiceprint Recognition that application the present embodiment provides, has the following advantage: the voice containing vocal print feature obtains convenient, natural, and voiceprint extraction can complete unconsciously, and therefore the acceptance level of user is also high;The identification obtaining voice is with low cost, uses simple, and a mike, when using communication apparatus more without extra sound pick-up outfit;It is suitable for remote identity to confirm, it is only necessary to a mike or phone, mobile phone just can pass through networking (communication network or internet) and realize Telnet;Vocal print identification and the algorithm complex confirmed are low;Coordinate some other measures, as carried out content discriminating etc. by speech recognition, it is possible to improve accuracy rate.
Such as, in the session discussing process that conference participation number is more, recording, by apply the present embodiment provide based on after the recording file segregation apparatus of Application on Voiceprint Recognition, it is possible to automatically play everyone viewpoint, or after meeting, when looking back conference content, namely playback file may know that full content, greatly facilitates summary of meeting work, meanwhile, the member not participating in meeting can also know all of conference content by recording file.
The present embodiment provide the recording file separation method based on Application on Voiceprint Recognition, record voice data many and complicated time, by different sound is separated and is individually stored so that user can hear recording substance clearly, user friendly Working Life.
It should be noted that, in this article, term " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or device not only include those key elements, but also include other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or device.When there is no more restriction, statement " including ... " key element limited, it is not excluded that there is also other identical element in including the process of this key element, method, article or device.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Through the above description of the embodiments, those skilled in the art is it can be understood that can add the mode of required general hardware platform by software to above-described embodiment method and realize, hardware can certainly be passed through, but in a lot of situation, the former is embodiment more preferably.Based on such understanding, the part that prior art is contributed by technical scheme substantially in other words can embody with the form of software product, this computer software product is stored in a storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions with so that a station terminal equipment (can be mobile phone, computer, server, air-conditioner, or the network equipment etc.) perform the method described by each embodiment of the present invention.
The present invention is that flow chart and/or block diagram with reference to method according to embodiments of the present invention, equipment (system) and computer program describe.It should be understood that can by the combination of the flow process in each flow process in computer program instructions flowchart and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can be provided to produce a machine to the processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device so that the instruction performed by the processor of computer or other programmable data processing device is produced for realizing the device of function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and can guide in the computer-readable memory that computer or other programmable data processing device work in a specific way, the instruction making to be stored in this computer-readable memory produces to include the manufacture of command device, and this command device realizes the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices provides for realizing the step of function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.
These are only optional embodiment of the present invention; not thereby the scope of the claims of the present invention is limited; every equivalent structure utilizing description of the present invention and accompanying drawing content to make or equivalence flow process conversion; or directly or indirectly it is used in other relevant technical fields, all in like manner include in the scope of patent protection of the present invention.

Claims (10)

CN201610077739.1A2016-02-032016-02-03Recording file separation method and device based on voiceprint identificationPendingCN105719659A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201610077739.1ACN105719659A (en)2016-02-032016-02-03Recording file separation method and device based on voiceprint identification

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201610077739.1ACN105719659A (en)2016-02-032016-02-03Recording file separation method and device based on voiceprint identification

Publications (1)

Publication NumberPublication Date
CN105719659Atrue CN105719659A (en)2016-06-29

Family

ID=56156568

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201610077739.1APendingCN105719659A (en)2016-02-032016-02-03Recording file separation method and device based on voiceprint identification

Country Status (1)

CountryLink
CN (1)CN105719659A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106409286A (en)*2016-09-232017-02-15努比亚技术有限公司Method and device for implementing audio processing
CN106448653A (en)*2016-09-272017-02-22惠州市德赛工业研究院有限公司Wearable intelligent terminal
CN106453865A (en)*2016-09-272017-02-22努比亚技术有限公司Mobile terminal and voice-text converting method
CN106448683A (en)*2016-09-302017-02-22珠海市魅族科技有限公司Method and device for viewing recording in multimedia files
CN106782500A (en)*2016-12-232017-05-31电子科技大学A kind of fusion feature parameter extracting method based on pitch period and MFCC
CN106792346A (en)*2016-11-142017-05-31广东小天才科技有限公司Audio adjusting method and device in teaching video
CN107093430A (en)*2017-05-102017-08-25哈尔滨理工大学A kind of vocal print feature extraction algorithm based on wavelet package transforms
CN107358963A (en)*2017-07-142017-11-17中航华东光电(上海)有限公司One kind removes breathing device and method in real time
CN108074574A (en)*2017-11-292018-05-25维沃移动通信有限公司Audio-frequency processing method, device and mobile terminal
CN108174236A (en)*2017-12-222018-06-15维沃移动通信有限公司 A media file processing method, server and mobile terminal
CN108182945A (en)*2018-03-122018-06-19广州势必可赢网络科技有限公司Voiceprint feature-based multi-person voice separation method and device
CN108257605A (en)*2018-02-012018-07-06广东欧珀移动通信有限公司Multi-channel recording method and device and electronic equipment
CN108492830A (en)*2018-03-282018-09-04深圳市声扬科技有限公司Method for recognizing sound-groove, device, computer equipment and storage medium
WO2019100500A1 (en)*2017-11-272019-05-31歌尔科技有限公司Voice signal denoising method and device
CN110648553A (en)*2019-09-262020-01-03北京声智科技有限公司Site reminding method, electronic equipment and computer readable storage medium
WO2021012734A1 (en)*2019-07-252021-01-28深圳壹账通智能科技有限公司Audio separation method and apparatus, electronic device and computer-readable storage medium
CN112820300A (en)*2021-02-252021-05-18北京小米松果电子有限公司Audio processing method and device, terminal and storage medium
CN115557339A (en)*2022-08-032023-01-03深聪半导体(江苏)有限公司Elevator calling method and system based on voice technology
TWI815343B (en)*2021-11-242023-09-11英華達股份有限公司Noise reduction processing method
US11974067B2 (en)2019-06-282024-04-30Huawei Technologies Co., Ltd.Conference recording method and apparatus, and conference recording system

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102347060A (en)*2010-08-042012-02-08鸿富锦精密工业(深圳)有限公司Electronic recording device and method
CN102760434A (en)*2012-07-092012-10-31华为终端有限公司Method for updating voiceprint feature model and terminal
CN102781075A (en)*2011-05-122012-11-14中兴通讯股份有限公司Method for reducing communication power consumption of mobile terminal and mobile terminal
CN103165131A (en)*2011-12-172013-06-19富泰华工业(深圳)有限公司Voice processing system and voice processing method
CN103514884A (en)*2012-06-262014-01-15华为终端有限公司Communication voice denoising method and terminal
US20140142944A1 (en)*2012-11-212014-05-22Verint Systems Ltd.Diarization Using Acoustic Labeling
CN103971696A (en)*2013-01-302014-08-06华为终端有限公司Method, device and terminal equipment for processing voice
CN104123950A (en)*2014-07-172014-10-29深圳市中兴移动通信有限公司Sound recording method and device
CN104123115A (en)*2014-07-282014-10-29联想(北京)有限公司Audio information processing method and electronic device
CN105096937A (en)*2015-05-262015-11-25努比亚技术有限公司Voice data processing method and terminal

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102347060A (en)*2010-08-042012-02-08鸿富锦精密工业(深圳)有限公司Electronic recording device and method
CN102781075A (en)*2011-05-122012-11-14中兴通讯股份有限公司Method for reducing communication power consumption of mobile terminal and mobile terminal
CN103165131A (en)*2011-12-172013-06-19富泰华工业(深圳)有限公司Voice processing system and voice processing method
CN103514884A (en)*2012-06-262014-01-15华为终端有限公司Communication voice denoising method and terminal
CN102760434A (en)*2012-07-092012-10-31华为终端有限公司Method for updating voiceprint feature model and terminal
US20140142944A1 (en)*2012-11-212014-05-22Verint Systems Ltd.Diarization Using Acoustic Labeling
CN103971696A (en)*2013-01-302014-08-06华为终端有限公司Method, device and terminal equipment for processing voice
CN104123950A (en)*2014-07-172014-10-29深圳市中兴移动通信有限公司Sound recording method and device
CN104123115A (en)*2014-07-282014-10-29联想(北京)有限公司Audio information processing method and electronic device
CN105096937A (en)*2015-05-262015-11-25努比亚技术有限公司Voice data processing method and terminal

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
彭诗雅: ""基于声纹识别的身份认证技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》*
李金宝: ""小波分析在声纹特征参数提取中的应用研究"", 《万方数据库》*
裴鑫: ""声纹识别系统关键技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》*

Cited By (25)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN106409286A (en)*2016-09-232017-02-15努比亚技术有限公司Method and device for implementing audio processing
CN106448653A (en)*2016-09-272017-02-22惠州市德赛工业研究院有限公司Wearable intelligent terminal
CN106453865A (en)*2016-09-272017-02-22努比亚技术有限公司Mobile terminal and voice-text converting method
CN106448683A (en)*2016-09-302017-02-22珠海市魅族科技有限公司Method and device for viewing recording in multimedia files
CN106792346A (en)*2016-11-142017-05-31广东小天才科技有限公司Audio adjusting method and device in teaching video
CN106782500A (en)*2016-12-232017-05-31电子科技大学A kind of fusion feature parameter extracting method based on pitch period and MFCC
CN107093430A (en)*2017-05-102017-08-25哈尔滨理工大学A kind of vocal print feature extraction algorithm based on wavelet package transforms
CN107358963A (en)*2017-07-142017-11-17中航华东光电(上海)有限公司One kind removes breathing device and method in real time
US11475907B2 (en)2017-11-272022-10-18Goertek Technology Co., Ltd.Method and device of denoising voice signal
WO2019100500A1 (en)*2017-11-272019-05-31歌尔科技有限公司Voice signal denoising method and device
CN108074574A (en)*2017-11-292018-05-25维沃移动通信有限公司Audio-frequency processing method, device and mobile terminal
CN108174236A (en)*2017-12-222018-06-15维沃移动通信有限公司 A media file processing method, server and mobile terminal
CN108257605A (en)*2018-02-012018-07-06广东欧珀移动通信有限公司Multi-channel recording method and device and electronic equipment
CN108182945A (en)*2018-03-122018-06-19广州势必可赢网络科技有限公司Voiceprint feature-based multi-person voice separation method and device
CN108492830A (en)*2018-03-282018-09-04深圳市声扬科技有限公司Method for recognizing sound-groove, device, computer equipment and storage medium
US11974067B2 (en)2019-06-282024-04-30Huawei Technologies Co., Ltd.Conference recording method and apparatus, and conference recording system
WO2021012734A1 (en)*2019-07-252021-01-28深圳壹账通智能科技有限公司Audio separation method and apparatus, electronic device and computer-readable storage medium
CN110648553B (en)*2019-09-262021-05-28北京声智科技有限公司Site reminding method, electronic equipment and computer readable storage medium
CN110648553A (en)*2019-09-262020-01-03北京声智科技有限公司Site reminding method, electronic equipment and computer readable storage medium
CN112820300A (en)*2021-02-252021-05-18北京小米松果电子有限公司Audio processing method and device, terminal and storage medium
CN112820300B (en)*2021-02-252023-12-19北京小米松果电子有限公司Audio processing method and device, terminal and storage medium
US12119012B2 (en)2021-02-252024-10-15Beijing Xiaomi Pinecone Electronics Co., Ltd.Method and apparatus for voice recognition in mixed audio based on pitch features using network models, and storage medium
TWI815343B (en)*2021-11-242023-09-11英華達股份有限公司Noise reduction processing method
US12114125B2 (en)2021-11-242024-10-08Inventec Appliances (Pudong) CorporationNoise cancellation processing method, device and apparatus
CN115557339A (en)*2022-08-032023-01-03深聪半导体(江苏)有限公司Elevator calling method and system based on voice technology

Similar Documents

PublicationPublication DateTitle
CN105719659A (en)Recording file separation method and device based on voiceprint identification
US10956771B2 (en)Image recognition method, terminal, and storage medium
EP3751569B1 (en)Multi-person voice separation method and apparatus
US12106768B2 (en)Speech signal processing method and speech separation method
US12354599B2 (en)Speech decoding method and apparatus, computer device, and storage medium
US20200294488A1 (en)Method, device and storage medium for speech recognition
EP2821992B1 (en)Method for updating voiceprint feature model and terminal
US9507772B2 (en)Instant translation system
US9899028B2 (en)Information processing device, information processing system, information processing method, and information processing program
US20240038238A1 (en)Electronic device, speech recognition method therefor, and medium
US11482237B2 (en)Method and terminal for reconstructing speech signal, and computer storage medium
US12335328B2 (en)Internet calling method and apparatus, computer device, and storage medium
CN106356065A (en)Mobile terminal and voice conversion method
CN105391843A (en)Terminal device, information issuing method and information issuing system
CN106328139A (en)Voice interaction method and voice interaction system
CN112712788B (en)Speech synthesis method, training method and device of speech synthesis model
CN113763933B (en)Speech recognition method, training method, device and equipment of speech recognition model
KR20180081922A (en)Method for response to input voice of electronic device and electronic device thereof
CN111341307A (en)Voice recognition method and device, electronic equipment and storage medium
CN106777248A (en)A kind of search engine test evaluation method and apparatus
US20170364516A1 (en)Linguistic model selection for adaptive automatic speech recognition
CN105096696A (en)Sign language translation apparatus and method based on intelligent bracelet
CN115394285A (en) Voice cloning method, device, equipment and storage medium
CN112735382B (en)Audio data processing method and device, electronic equipment and readable storage medium
CN116665635A (en)Speech synthesis method, electronic device, and computer-readable storage medium

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
RJ01Rejection of invention patent application after publication
RJ01Rejection of invention patent application after publication

Application publication date:20160629


[8]ページ先頭

©2009-2025 Movatter.jp