CN105719659A

Movatterモバイル変換

Info

Publication number: CN105719659A
Application number: CN201610077739.1A
Authority: CN
Inventors: 廖娟娟
Original assignee: Nubia Technology Co Ltd
Current assignee: Nubia Technology Co Ltd
Priority date: 2016-02-03
Filing date: 2016-02-03
Publication date: 2016-06-29

Abstract

The embodiment of the invention discloses a recording file separation method based on voiceprint identification. The method comprises the steps of: extracting voiceprint characteristic data in recording signals; comparing the voiceprint characteristic data with preset voice models; and according to a comparison result, encoding recording signal units corresponding to the identical voiceprint characteristic data individually, and storing the recording signal units as individual voice files. The embodiment of the invention further provides a recording file separation device based on voiceprint identification. In this way, when the recorded voice data is in a large amount and is complex, the different voices are separated and individually stored, so that a user is enabled to hear clear recording content, and convenience is provided for the work and life of the user.

Description

Recording file separation method and device based on Application on Voiceprint Recognition

Technical field

The present invention relates to recording technology field, particularly relate to a kind of recording file separation method based on Application on Voiceprint Recognition and device.

Background technology

At present, the mobile terminals such as mobile phone are used to record, the voice data recorded is very many and complicated, for instance, for session recording, generally comprise the voice data of multiple meeting participant, when looking back conference content or carrying out summary of meeting work, it is necessary to listen to recording file, but owing to the people of conference participation is more, possibly cannot not hear recording substance, makes troubles to the Working Life of user.

Summary of the invention

In view of this, embodiment of the present invention expectation provides a kind of recording file separation method based on Application on Voiceprint Recognition and device, so that user can hear recording substance clearly, user friendly Working Life from recorded file.

Embodiments of the invention provide a kind of recording file segregation apparatus based on Application on Voiceprint Recognition on the one hand, and described device includes: voiceprint extraction module, contrast module and code storage module；

Wherein, described voiceprint extraction module, for extracting the vocal print characteristic in recorded audio signals；

Described contrast module, for contrasting described vocal print characteristic and default speech model；

Described code storage module, for the comparing result according to described contrast module, carries out separately encoded to the recorded audio signals monomer corresponding with identical vocal print characteristic, is stored as independent audio files.

Optionally, described voiceprint extraction module specifically for:

By wavelet transformation technique, extract following vocal print characteristic in described recorded audio signals: fundamental tone frequency spectrum and profile thereof, the energy of fundamental tone frame, the frequency of occurrences of fundamental tone formant and track, linear prediction cepstrum coefficient, line spectrum pair, auto-correlation and log area ratio, Mel frequency cepstral coefficient MFCC, perception linear prediction.

Optionally, described default speech model include following at least one: vector quantization model, stochastic model and neural network model.

Optionally, described code storage module specifically for:

Undertaken the recorded audio signals monomer corresponding with identical vocal print characteristic strengthening processing and amplifying；Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded.

Optionally, described device also includes noise reduction module, for the audio signal collected is carried out noise reduction process, obtains described recorded audio signals.

Embodiments of the invention provide a kind of recording file separation method based on Application on Voiceprint Recognition on the other hand, and described method includes:

Extract the vocal print characteristic in recorded audio signals；

Described vocal print characteristic and default speech model are contrasted；

According to comparing result, the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, is stored as independent audio files.

Optionally, the vocal print characteristic in described extraction recorded audio signals, including:

Optionally, described the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, including:

Undertaken the recorded audio signals monomer corresponding with identical vocal print characteristic strengthening processing and amplifying；

Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded.

Optionally, before the vocal print characteristic in described extraction recorded audio signals, described method also includes:

The audio signal collected is carried out noise reduction process, obtains described recorded audio signals.

A kind of recording file separation method based on Application on Voiceprint Recognition that the embodiment of the present invention provides and device, the recording file segregation apparatus based on Application on Voiceprint Recognition extracts the vocal print characteristic in recorded audio signals；Described vocal print characteristic and default speech model are contrasted；According to comparing result, the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, is stored as independent audio files.So, record voice data many and complicated time, by different sound is separated and is individually stored so that user can hear recording substance clearly, user friendly Working Life.

Accompanying drawing explanation

Fig. 1 is the hardware architecture diagram of a kind of optional mobile terminal realizing the embodiment of the present invention；

The communication system architecture schematic diagram that the mobile terminal that Fig. 2 provides for the embodiment of the present invention is operable to；

The structural representation of the recording file segregation apparatus based on Application on Voiceprint Recognition that Fig. 3 provides for the embodiment of the present invention；

The flow chart of the recording file separation method based on Application on Voiceprint Recognition that Fig. 4 provides for the embodiment of the present invention.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described.

The mobile terminal realizing each embodiment of the present invention is described referring now to accompanying drawing 1.In follow-up description, use the suffix being used for representing such as " module ", " parts " or " unit " of element only for being conducive to the explanation of the present invention, itself do not have specific meaning.Therefore, " module " and " parts " can mixedly use.

Mobile terminal can be implemented in a variety of manners.Such as, the terminal described in the present invention can include the mobile terminal of such as mobile phone, smart phone, notebook computer, digit broadcasting receiver, personal digital assistant (PDA), panel computer (PAD), portable media player (PMP), guider etc. and the fixed terminal of such as numeral TV, desk computer etc..Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for the element of mobile purpose, structure according to the embodiment of the present invention can also apply to the terminal of fixed type.

Fig. 1 is the hardware architecture diagram realizing a kind of optional mobile terminal of each embodiment of the present invention.

Mobile terminal 100 can include wireless communication unit 110, audio/video (A/V) input block 120, user input unit 130, output unit 150, memorizer 160, interface unit 170, controller 180 and power subsystem 190 etc..Fig. 1 illustrates the mobile terminal with various assembly, it should be understood that be not required for implementing all assemblies illustrated, it is possible to alternatively implement more or less of assembly, will be discussed in more detail below the element of mobile terminal.

Wireless communication unit 110 generally includes one or more assembly, and it allows the radio communication between mobile terminal 100 and wireless communication system or network.Such as, wireless communication unit can include at least one in mobile communication module 112, wireless Internet module 113, short range communication module 114 and positional information module 115.

Mobile communication module 112 sends radio signals at least one in base station (such as, access point, node B etc.), exterior terminal and server and/or receives from it radio signal.Such radio signal can include voice call signal, video calling signal or the various types of data sending according to text and/or Multimedia Message and/or receiving.

Wireless Internet module 113 supports the Wi-Fi (Wireless Internet Access) of mobile terminal.This module can internally or externally be couple to terminal.Wi-Fi (Wireless Internet Access) technology involved by this module can include WLAN (WLAN) (Wi-Fi), Wibro (WiMAX), Wimax (worldwide interoperability for microwave access), HSDPA (high-speed downlink packet access) etc..

Short range communication module 114 is the module for supporting junction service.Some examples of short-range communication technology include bluetooth TM, RF identification (RFID), Infrared Data Association (IrDA), ultra broadband (UWB), purple honeybee TM etc..

Positional information module 115 is the module of positional information for checking or obtain mobile terminal.The typical case of positional information module 115 is GPS (global positioning system).According to current technology, GPS module 115 calculates from the range information of three or more satellites and correct time information and the Information application triangulation for calculating, thus according to longitude, latitude with highly accurately calculate three-dimensional current location information.Currently, the method for calculating position and temporal information uses three satellites and the error by the position using an other satellite correction to calculate and temporal information.Additionally, GPS module 115 can calculate velocity information by Continuous plus current location information in real time.

A/V input block 120 is used for receiving audio or video signal.A/V input block 120 can include camera 121 and mike 122, and the view data of the camera 121 static images to being obtained by image capture apparatus in Video Capture pattern or image capture mode or video processes.Picture frame after process may be displayed on display unit 151.Picture frame after camera 121 processes can be stored in memorizer 160 (or other storage medium) or be transmitted via wireless communication unit 110, it is possible to provide two or more cameras 121 according to the structure of mobile terminal.Mike 122 can receive sound (voice data) via mike 122 in telephone calling model, logging mode, speech recognition mode etc. operational mode, and can be voice data by such acoustic processing.Audio frequency (voice) data after process can be converted to the form output that can be sent to mobile communication base station via mobile communication module 112 when telephone calling model.Mike 122 can implement various types of noise elimination (or suppression) algorithm to eliminate (or suppression) in the noise received and produce in the process of transmission audio signal or interference.

User input unit 130 can generate key input data to control the various operations of mobile terminal according to the order of user's input.User input unit 130 allows user to input various types of information, and can include keyboard, touch pad (such as, detection due to touched and cause resistance, pressure, electric capacity etc. the sensitive component of change) etc..Especially, when touch pad is superimposed upon on display unit 151 as a layer, it is possible to form touch screen.

Interface unit 170 is used as at least one external device (ED) and is connected, with mobile terminal 100, the interface that can pass through.Such as, external device (ED) can include wired or wireless head-band earphone port, external power source (or battery charger) port, wired or wireless FPDP, memory card port, for connecting the port of the device with identification module, audio frequency input/output (I/O) port, video i/o port, ear port etc..Interface unit 170 may be used for receiving from the input (such as, data message, electric power etc.) of external device (ED) and the one or more elements being transferred in mobile terminal 100 by the input received or may be used for transmission data between mobile terminal and external device (ED).

Output unit 150 can include display module 151 and dio Output Modules 152 etc..

Display unit 151 may be displayed on the information processed in mobile terminal 100.Such as, when mobile terminal 100 is in telephone calling model, display unit 151 can show the user interface (UI) relevant with call or other communicate (such as, text messaging, multimedia file download etc.) or graphic user interface (GUI).When being in video calling pattern or image capture mode when mobile terminal 100, display unit 151 can show the image of image and/or the reception caught, UI or GUI illustrating video or image and correlation function etc..

Meanwhile, when display module 151 and touch pad as a layer superposed on one another to form touch screen time, display module 151 can serve as input equipment and output device.Display module 151 can include at least one in liquid crystal display (LCD), thin film transistor (TFT) LCD (TFT-LCD), Organic Light Emitting Diode (OLED) display, flexible display, three-dimensional (3D) display etc..Some in these display may be constructed such that transparence is to allow user to watch from outside, and this is properly termed as transparent display, and typical transparent display can be such as TOLED (transparent organic light emitting diode) display etc..According to the specific embodiment wanted, mobile terminal 100 can include two or more display units (or other display device), such as, mobile terminal can include outernal display unit (not shown) and inner display unit (not shown).Touch screen can be used for detecting touch input pressure and touch input position and touch input area.

Dio Output Modules 152 can mobile terminal be in call signal receive under the isotype such as pattern, call mode, logging mode, speech recognition mode, broadcast reception mode time, that wireless communication unit 110 is received or storage in memorizer 160 voice data transducing audio signal and be output as sound.And, dio Output Modules 152 can provide the audio frequency output (such as, call signal receive sound, message sink sound etc.) relevant to the specific function of mobile terminal 100 execution.Dio Output Modules 152 can include speaker, buzzer etc..

Memorizer 160 can store the process performed by controller 180 and control software program etc. of operation, or can temporarily store and export the data that maybe will export.And, memorizer 160 can store the vibration about the various modes exported when touching and being applied to touch screen and the data of audio signal.

Memorizer 160 can include the storage medium of at least one type, described storage medium includes flash memory, hard disk, multimedia card, card-type memorizer (such as, SD or DX memorizer etc.), random access storage device (RAM), static random-access memory (SRAM), read only memory (ROM), Electrically Erasable Read Only Memory (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc..And, mobile terminal 100 can be connected the network storage device cooperation of the storage function performing memorizer 160 with by network.

Controller 180 generally controls the overall operation of mobile terminal.Such as, controller 180 performs the control relevant to voice call, data communication, video calling etc. and process.It addition, controller 180 can include the multi-media module 181 for reproducing (or playback) multi-medium data, multi-media module 181 can construct in controller 180, or it is so structured that separates with controller 180.Controller 180 can perform pattern recognition process, so that the handwriting input performed on the touchscreen or picture drafting input are identified as character or image.

Power subsystem 190 receives external power or internal power under the control of controller 180 and provides the suitable electric power operated needed for each element and assembly.

Various embodiment described herein can to use such as computer software, hardware or its any combination of computer-readable medium to implement.Hardware is implemented, embodiment described herein can pass through to use application-specific IC (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), processor, controller, microcontroller, microprocessor, at least one that is designed to perform in the electronic unit of function described herein to implement, in some cases, such embodiment can be implemented in controller 180.Implementing for software, the embodiment of such as process or function can be implemented with allowing the independent software module performing at least one function or operation.Software code can be implemented by the software application (or program) write with any suitable programming language, and software code can be stored in memorizer 160 and be performed by controller 180.

So far, mobile terminal is described according to its function.Below, for the sake of brevity, by the slide type mobile terminal in the various types of mobile terminals describing such as folded form, board-type, oscillating-type, slide type mobile terminal etc. exemplarily.Therefore, the present invention can be applied to any kind of mobile terminal, and is not limited to slide type mobile terminal.

Mobile terminal 100 as shown in Figure 1 may be constructed such that utilization operates via such as wired and wireless communication system and the satellite-based communication system of frame or packet transmission data.

The communication system being wherein operable to according to the mobile terminal of the present invention is described referring now to Fig. 2.

Such communication system can use different air interfaces and/or physical layer.Such as, the air interface used by communication system includes such as frequency division multiple access (FDMA), time division multiple acess (TDMA), CDMA (CDMA) and UMTS (UMTS) (especially, Long Term Evolution (LTE)), global system for mobile communications (GSM) etc..As non-limiting example, as explained below relates to cdma communication system, but such instruction is equally applicable to other type of system.

With reference to Fig. 2, cdma wireless communication system can include multiple mobile terminal 100, multiple base station (BS) 270, base station controller (BSC) 275 and mobile switching centre (MSC) 280.MSC280 is configured to form interface with Public Switched Telephony Network (PSTN) 290.MSC280 is also structured to and the BSC275 formation interface that can be couple to base station 270 via back haul link.Back haul link can construct according to any one in some known interfaces, and described interface includes such as E1/T1, ATM, IP, PPP, frame relay, HDSL, ADSL or xDSL.It will be appreciated that system as shown in Figure 2 can include multiple BSC275.

Each BS270 can service one or more subregion (or region), by each subregion of multidirectional antenna or the antenna covering pointing to specific direction radially away from BS270.Or, each subregion can be covered by two or more antennas for diversity reception.Each BS270 may be constructed such that support multiple frequencies distribution, and the distribution of each frequency has specific frequency spectrum (such as, 1.25MHz, 5MHz etc.).

Intersecting that subregion and frequency are distributed can be referred to as CDMA Channel.BS270 can also be referred to as base station transceiver subsystem (BTS) or other equivalent terms.In this case, term " base station " may be used for broadly representing single BSC275 and at least one BS270.Base station can also be referred to as " cellular station ".Or, each subregion of specific BS270 can be referred to as multiple cellular station.

In fig. 2 it is shown that several global positioning systems (GPS) satellite 300.Satellite 300 helps to position at least one in multiple mobile terminals 100.

In fig. 2, depict multiple satellite 300, it is understood that be, it is possible to use any number of satellite obtains useful location information.GPS module 115 as shown in Figure 1 is generally configured to coordinate the location information wanted with acquisition with satellite 300.Substitute GPS tracking technique or outside GPS tracking technique, it is possible to use other technology of the position of mobile terminal can be followed the tracks of.It addition, at least one gps satellite 300 can optionally or additionally process satellite dmb transmission.

As a typical operation of wireless communication system, BS270 receives the reverse link signal from various mobile terminals 100.Mobile terminal 100 generally participates in call, information receiving and transmitting communicates with other type of.Each reverse link signal that certain base station 270 receives is processed in specific BS270.The data obtained are forwarded to relevant BSC275.BSC provides call resource distribution and the mobile management function of the coordination of soft switching process included between BS270.The data received also are routed to MSC280 by BSC275, and it provides the extra route service for forming interface with PSTN290.Similarly, PSTN290 and MSC280 forms interface, and MSC280 and BSC275 forms interface, and BSC275 correspondingly controls BS270 so that forward link signals to be sent to mobile terminal 100.

Based on above-mentioned mobile terminal hardware configuration and communication system, it is proposed to each embodiment of the inventive method.

The structural representation of the recording file segregation apparatus based on Application on Voiceprint Recognition that Fig. 3 provides for the embodiment of the present invention.As it is shown on figure 3, the device that the present embodiment provides may include that voiceprint extraction module 31, contrast module 32 and code storage module 33.

Wherein, described voiceprint extraction module 31, for extracting the vocal print characteristic in recorded audio signals；

Described contrast module 32, for contrasting described vocal print characteristic and default speech model；

Described code storage module 33, for the comparing result according to described contrast module 32, carries out separately encoded to the recorded audio signals monomer corresponding with identical vocal print characteristic, is stored as independent audio files.

Firstly the need of illustrating, so-called vocal print (Voiceprint), is the sound wave spectrum carrying verbal information that shows of electricity consumption acoustic instrument.The generation of human language is a complicated physiology physical process between Body Languages maincenter and phonatory organ, the phonatory organ that people uses when speech--tongue, tooth, larynx, lung, nasal cavity everyone widely different in size and form, so the vocal print collection of illustrative plates of any two people is all variant.

Everyone existing relative stability of Speech acoustics feature, has again variability, is not absolute, unalterable.This variation may be from physiology, pathology, psychology, simulation, camouflage, also relevant with environmental disturbances.While it is true, owing to everyone phonatory organ are not quite similar, therefore in the ordinary course of things, people remain to the sound of the different people of difference or judge whether it is the sound of same people.

Further, vocal print is characterized by the acoustic features that the anatomical structure of the pronunciation mechanism with the mankind is relevant, such as frequency spectrum, cepstrum, formant, fundamental tone, reflection coefficient etc., rhinophonia, band deep breathing sound, hoarse sound, laugh etc.；The vocal print feature of the mankind is by the impact of socioeconomics, education level, birthplace, semanteme, rhetoric, pronunciation, speech custom etc..

For vocal print feature, the features such as personal touch or the rhythm affected by father and mother, rhythm, speed, intonation, volume, can from the angle utilizing mathematical method to model, the current operable feature of the automatic model of cognition of vocal print includes: acoustic features, such as cepstrum；Lexical characteristics, such as the word n-gram that speaker is correlated with, phoneme n-gram etc.；Prosodic features, as utilized the n-gram fundamental tone described and energy " posture "；Languages, dialect and accent information；Which kind of passage channel information, as used.

In practical application, when user is when recording, described voiceprint extraction module 31 is specifically for passing through wavelet transformation technique, extract following vocal print characteristic in described recorded audio signals: fundamental tone frequency spectrum and profile thereof, the energy of fundamental tone frame, the frequency of occurrences of fundamental tone formant and track, linear prediction cepstrum coefficient, line spectrum pair, auto-correlation and log area ratio, Mel frequency cepstral coefficient (MelFrequencyCepstrumCoefficient, MFCC), perception linear prediction.

In the present embodiment, described default speech model include following at least one: vector quantization model, stochastic model and neural network model；Saved vocal print characteristic and described default speech model are contrasted by described contrast module 32, namely using the vocal print characteristic mated with described default speech model in saved vocal print characteristic as identical vocal print characteristic, main running signal within the scope of preset signals can be retained, all the other signals of automatic fitration, thus obtaining comparatively complete all main body vocal print signals.

Described code storage module 33 specifically for: the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out strengthen processing and amplifying；Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded；That is, all main body vocal print signals that described code storage module 33 is complete are separately separated out, and through strengthening processing and amplifying, obtain all vocal print signal monomers separated, described all vocal print signal monomers are corresponding to identical vocal print characteristic, described all vocal print signal Monomers code, it is reduced into independent acoustical signal, and is separately stored as audio files.

It follows that in the present embodiment, it is possible to isolate at least one can individually store, audio files clearly from a recording file, it is simple to user hears recording substance clearly.

Further, the device that the present embodiment provides can also include noise reduction module, for when user is when recording, the audio signal collected being carried out noise reduction process, obtains described recorded audio signals.

In practical application, the recording file segregation apparatus based on Application on Voiceprint Recognition that application the present embodiment provides, when carrying out recording file and separating, it is possible to be divided into two stages.

First stage, is voiceprint extraction separation phase, specifically includes vocal print feature extraction, vocal print comparison and differentiates decision-making；Wherein, vocal print feature extraction, is extracted all separable vocal print characteristics in recording file by described voiceprint extraction module 31；Vocal print comparison, is the vocal print characteristic separated contrasted, and verification, finer vocal print separates；Differentiate decision-making, be that the main running signal within the scope of preset signals is retained, all the other signal automatic fitrations, obtain comparatively complete all main body vocal print signals.

Second stage, is that vocal print reconfigures the stage, concrete, refers to the multiple main body vocal print signals preserved, separately encoded, is reduced into independent acoustical signal, and is separately stored as audio files.

The recording file segregation apparatus based on Application on Voiceprint Recognition that application the present embodiment provides, has the following advantage: the voice containing vocal print feature obtains convenient, natural, and voiceprint extraction can complete unconsciously, and therefore the acceptance level of user is also high；The identification obtaining voice is with low cost, uses simple, and a mike, when using communication apparatus more without extra sound pick-up outfit；It is suitable for remote identity to confirm, it is only necessary to a mike or phone, mobile phone just can pass through networking (communication network or internet) and realize Telnet；Vocal print identification and the algorithm complex confirmed are low；Coordinate some other measures, as carried out content discriminating etc. by speech recognition, it is possible to improve accuracy rate.

Such as, in the session discussing process that conference participation number is more, recording, by apply the present embodiment provide based on after the recording file segregation apparatus of Application on Voiceprint Recognition, it is possible to automatically play everyone viewpoint, or after meeting, when looking back conference content, namely playback file may know that full content, greatly facilitates summary of meeting work, meanwhile, the member not participating in meeting can also know all of conference content by recording file.

The present embodiment provide the recording file segregation apparatus based on Application on Voiceprint Recognition, record voice data many and complicated time, by different sound is separated and is individually stored so that user can hear recording substance clearly, user friendly Working Life.

In actual applications, described voiceprint extraction module 31, contrast module 32 and code storage module 33 all can by the central processing unit (CentralProcessingUnit in the recording file segregation apparatus being based on Application on Voiceprint Recognition, CPU), microprocessor (MicroProcessorUnit, MPU), digital signal processor (DigitalSignalProcessor, DSP) or field programmable gate array (FieldProgrammableGateArray, FPGA) etc. realize.

The flow chart of the recording file separation method based on Application on Voiceprint Recognition that Fig. 4 provides for the embodiment of the present invention.As shown in Figure 4, the method that the present embodiment provides specifically can be performed by the recording file segregation apparatus based on Application on Voiceprint Recognition, specifically can should be arranged in the mobile terminal such as mobile phone, panel computer based on the recording file segregation apparatus of Application on Voiceprint Recognition, concrete, the method that the present embodiment provides may include that

Step 401, the vocal print characteristic extracted in recorded audio signals.

Step 402, described vocal print characteristic and default speech model are contrasted.

Step 403, according to comparing result, the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, is stored as independent audio files.

Concrete, in above-mentioned steps 401, when user is when recording, wavelet transformation technique can be passed through, extract following vocal print characteristic in described recorded audio signals: fundamental tone frequency spectrum and profile thereof, the energy of fundamental tone frame, the frequency of occurrences of fundamental tone formant and track, linear prediction cepstrum coefficient, line spectrum pair, auto-correlation and log area ratio, MFCC, perception linear prediction.

In the present embodiment, described default speech model include following at least one: vector quantization model, stochastic model and neural network model；Saved vocal print characteristic and described default speech model are contrasted, namely using the vocal print characteristic mated with described default speech model in saved vocal print characteristic as identical vocal print characteristic, main running signal within the scope of preset signals can be retained, all the other signals of automatic fitration, thus obtaining comparatively complete all main body vocal print signals.

In above-mentioned steps 403, undertaken the recorded audio signals monomer corresponding with identical vocal print characteristic strengthening processing and amplifying；Recorded audio signals monomer after strengthening processing and amplifying is carried out separately encoded；That is, complete all main body vocal print signals are separately separated out, and through strengthening processing and amplifying, obtain all vocal print signal monomers separated, described all vocal print signal monomers are corresponding to identical vocal print characteristic, described all vocal print signal Monomers code, it is reduced into independent acoustical signal, and is separately stored as audio files.

Further, before above-mentioned steps 401, before the vocal print characteristic in extracting recorded audio signals, when user is when recording, in addition it is also necessary to the audio signal collected is carried out noise reduction process, obtains described recorded audio signals.

First stage, is voiceprint extraction separation phase, specifically includes vocal print feature extraction, vocal print comparison and differentiates decision-making；Wherein, vocal print feature extraction, is extract all separable vocal print characteristics in recording file；Vocal print comparison, is the vocal print characteristic separated contrasted, and verification, finer vocal print separates；Differentiate decision-making, be that the main running signal within the scope of preset signals is retained, all the other signal automatic fitrations, obtain comparatively complete all main body vocal print signals.

The present embodiment provide the recording file separation method based on Application on Voiceprint Recognition, record voice data many and complicated time, by different sound is separated and is individually stored so that user can hear recording substance clearly, user friendly Working Life.

It should be noted that, in this article, term " includes ", " comprising " or its any other variant are intended to comprising of nonexcludability, so that include the process of a series of key element, method, article or device not only include those key elements, but also include other key elements being not expressly set out, or also include the key element intrinsic for this process, method, article or device.When there is no more restriction, statement " including ... " key element limited, it is not excluded that there is also other identical element in including the process of this key element, method, article or device.

The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.

Through the above description of the embodiments, those skilled in the art is it can be understood that can add the mode of required general hardware platform by software to above-described embodiment method and realize, hardware can certainly be passed through, but in a lot of situation, the former is embodiment more preferably.Based on such understanding, the part that prior art is contributed by technical scheme substantially in other words can embody with the form of software product, this computer software product is stored in a storage medium (such as ROM/RAM, magnetic disc, CD), including some instructions with so that a station terminal equipment (can be mobile phone, computer, server, air-conditioner, or the network equipment etc.) perform the method described by each embodiment of the present invention.

The present invention is that flow chart and/or block diagram with reference to method according to embodiments of the present invention, equipment (system) and computer program describe.It should be understood that can by the combination of the flow process in each flow process in computer program instructions flowchart and/or block diagram and/or square frame and flow chart and/or block diagram and/or square frame.These computer program instructions can be provided to produce a machine to the processor of general purpose computer, special-purpose computer, Embedded Processor or other programmable data processing device so that the instruction performed by the processor of computer or other programmable data processing device is produced for realizing the device of function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.

These computer program instructions may be alternatively stored in and can guide in the computer-readable memory that computer or other programmable data processing device work in a specific way, the instruction making to be stored in this computer-readable memory produces to include the manufacture of command device, and this command device realizes the function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.

These computer program instructions also can be loaded in computer or other programmable data processing device, make on computer or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computer or other programmable devices provides for realizing the step of function specified in one flow process of flow chart or multiple flow process and/or one square frame of block diagram or multiple square frame.

These are only optional embodiment of the present invention; not thereby the scope of the claims of the present invention is limited; every equivalent structure utilizing description of the present invention and accompanying drawing content to make or equivalence flow process conversion; or directly or indirectly it is used in other relevant technical fields, all in like manner include in the scope of patent protection of the present invention.

Claims

1. the recording file segregation apparatus based on Application on Voiceprint Recognition, it is characterised in that described device includes: voiceprint extraction module, contrast module and code storage module；

2. device according to claim 1, it is characterised in that described voiceprint extraction module specifically for:

3. device according to claim 2, it is characterised in that described default speech model include following at least one: vector quantization model, stochastic model and neural network model.

4. device according to claim 3, it is characterised in that described code storage module specifically for:

5. device according to claim 1, it is characterised in that described device also includes noise reduction module, for the audio signal collected is carried out noise reduction process, obtains described recorded audio signals.

6. the recording file separation method based on Application on Voiceprint Recognition, it is characterised in that described method includes:

Extract the vocal print characteristic in recorded audio signals；

Described vocal print characteristic and default speech model are contrasted；

7. method according to claim 6, it is characterised in that the vocal print characteristic in described extraction recorded audio signals, including:

8. method according to claim 7, it is characterised in that described default speech model include following at least one: vector quantization model, stochastic model and neural network model.

9. method according to claim 8, it is characterised in that described the recorded audio signals monomer corresponding with identical vocal print characteristic is carried out separately encoded, including:

10. method according to claim 6, it is characterised in that before the vocal print characteristic in described extraction recorded audio signals, described method also includes: