A kind of song generation method, device, terminal and storage mediumTechnical field
The present embodiments relate to field of computer technology more particularly to a kind of song generation method, device, terminal and depositStorage media.
Background technique
Voice switching singing voice, which refers to, transforms into corresponding song for the voice of user.Such internet product can incite somebody to actionAfter the voice of user transforms into song, in conjunction with accompaniment music, synthesize user oneself sings works, has entertainment, social activityProperty and certain market value.
The scheme for converting speech into song in the prior art is mainly:In model training stage, use professional singer A'sThe text data (including the lyrics etc.) and singer A of multiple songs sing the acoustic feature of corresponding song, carry out model training, obtainTo the acoustic model of singer A;In song generation phase, the voice data that user B sang or read song is obtained, according to the voiceThe lyrics of data identification song simultaneously obtain the acoustic feature of user B;The acoustic model for the lyrics input singer A that will identify that, obtainsThe prediction acoustic feature exported to the acoustic model, according to the fundamental frequency and duration of a sound update prediction acoustics in the acoustic feature of user BFundamental frequency and the duration of a sound in feature, obtain modified acoustic feature, and what which included is the base of user BFrequently, the frequency spectrum of the duration of a sound of user B, singer A, therefore modified acoustic feature is spelled using parametric statistical methods or sound library againMethod is connect, the pitch and rhythm of characteristic voice and user B of the obtained song with singer A have reached singer A and imitated user BThe effect to give song recitals.
Above scheme generally requires to carry out acoustic training model, to the more demanding of sample data volume, realizes that process is multipleIt is miscellaneous, and the loss in sound quality can be brought;In addition, having the characteristic voice of singer using the song of above method synthesis, cause to useThe participation and Experience Degree at family are bad.
Summary of the invention
The embodiment of the present invention provides a kind of song generation method, device, terminal and storage medium, is not necessarily to carry out sound to reachModel training is learned, the voice of user can be converted to the effect for remaining with the song of sound of user oneself.
In a first aspect, the embodiment of the invention provides a kind of song generation method, the method includes:
Obtain the voice signal corresponding with song of user's typing;
The corresponding standard acoustic feature information of the song is obtained from the acoustic feature template pre-established, according to describedStandard acoustic feature information carries out the update of acoustic feature information to the voice signal;Wherein, in the acoustic feature templatePreserve the standard acoustic feature information of at least one song;
Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.
Second aspect, the embodiment of the invention also provides a kind of song generating means, described device includes:
Voice signal obtains module, for obtaining the voice signal corresponding with song of user's typing;
Acoustic feature information updating module, it is corresponding for obtaining the song from the acoustic feature template pre-establishedStandard acoustic feature information carries out acoustic feature information more to the voice signal according to the standard acoustic feature informationNewly;Wherein, the standard acoustic feature information of at least one song is preserved in the acoustic feature template;
Targeted voice signal determining module, for that will have the voice signal of updated acoustic feature information as targetVoice signal is stored or is exported.
The third aspect, the embodiment of the invention also provides a kind of songs to generate terminal, and the terminal includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processingDevice realizes song generation method described in first aspect as above.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computerProgram, the program realize song generation method described in first aspect as above when being executed by processor.
The embodiment of the present invention passes through the voice signal corresponding with song for obtaining user's typing, special from the acoustics pre-establishedIt levies and obtains the corresponding standard acoustic feature information of song in template, acoustics is carried out to voice signal according to standard acoustic feature informationThe update of characteristic information, wherein the standard acoustic feature information that at least one song is preserved in acoustic feature template will haveThe voice signal of updated acoustic feature information is stored or is exported as targeted voice signal, is overcome in the prior artAcoustic training model is carried out to realize the conversion of voice to song using a large amount of data, and is not included in finally formed songWithout carrying out acoustic training model, i.e., the sound of user oneself, the problem for causing user's participation and Experience Degree not high realizeThe effect that the voice of user is converted to the song of sound for remaining with user oneself can be achieved, meanwhile, also ensure that song hasThere is good acoustical quality.
Detailed description of the invention
Fig. 1 is the flow chart of the song generation method in the embodiment of the present invention one;
Fig. 2 is the flow chart of the song generation method in the embodiment of the present invention two;
Fig. 3 is the flow chart of the song generation method in the embodiment of the present invention three;
Fig. 4 is the structural schematic diagram of the song generating means in the embodiment of the present invention four;
Fig. 5 is the structural schematic diagram of the song generation terminal in the embodiment of the present invention five.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouchedThe specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to justOnly the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is the flow chart of a kind of song generation method that the embodiment of the present invention one provides, the present embodiment be applicable to byThe case where voice of user is converted into song, this method can be executed by song generating means, wherein the device can be by softwareAnd/or hardware realization, it can generally be integrated in song and generate in terminal, as shown in Figure 1, the method for the present embodiment specifically includes:
S110, the voice signal corresponding with song for obtaining user's typing.
Wherein, the voice signal corresponding with song of user's typing can be using specific song content as object, by withFamily is generated by way of reading aloud or singing.The voice signal may include various information, such as may include particular songsLyrics information and acoustics characteristic information, acoustic feature information include reflect pitch fundamental frequency information, reflect volume energy letterBreath, duration information of reflection rhythm etc..Wherein, it may determine that the user reads aloud or sings specifically according to acoustic feature informationThe level and professional singer of song sing the gap between the professional standards of the song.
Preferably, user can generate the request that terminal sends typing voice signal corresponding with song, song to songAfter generation terminal receives the request, it can pass through and open the voice signal that microphone etc. obtains user's typing.Wherein, song is rawIt can be independent hardware device at terminal, such as intelligent sound box, be used for interactive robot, be also possible to be installed on eachClient in terminal (such as mobile phone, notebook, smart television etc.).
S120, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, according to markQuasi- acoustic feature information carries out the update of acoustic feature information to voice signal.
Wherein, acoustic feature template is obtained by extracting the acoustic feature information at least one song that professional singer is recorded, wherein preserving the standard acoustic feature information of at least one song.In the present embodiment, get user's typing withAfter the corresponding voice signal of particular songs, in order to update the acoustic feature information of the voice signal, it can preferably be built from advanceStandard acoustic feature information corresponding with particular songs is obtained in vertical acoustic feature template, is believed according to the standard acoustic featureBreath updates the corresponding acoustic feature information of voice signal.
Illustratively, user wants to obtain sound characteristic while and the song of the acoustic feature with professional singer with oneselfSong preferably can generate terminal typing song A to song by way of performance.At this point, in order to by user give song recitals A whenAcoustic feature is converted into the acoustic feature of professional singer, can use song and generates the acoustic feature mould pre-saved in terminalPlate.Specifically, can determine that the corresponding song of voice signal of user's typing has according to the lyrics of song A or the selection of userWhich head body is, after determining song, the corresponding standard acoustic feature of the song can be obtained from acoustic feature templateInformation, and be updated using acoustic feature information of the standard acoustic feature information to the voice signal of user's typing.
S130, using the voice signal with updated acoustic feature information as targeted voice signal carry out storage or it is defeatedOut.
Above-mentioned standard acoustic feature information and use of the voice signal with professional singer having updated after acoustic feature informationThe sound characteristic information at family oneself, it is therefore preferable that can be using the voice signal with updated acoustic feature information as meshPoster sound signal is saved or is exported.
Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, fromThe corresponding standard acoustic feature information of song is obtained in the acoustic feature template pre-established, according to standard acoustic feature information pairThe update of voice signal progress acoustic feature information, wherein the standard sound of at least one song is preserved in acoustic feature templateLearn characteristic information, using the voice signal with updated acoustic feature information as targeted voice signal carry out store or it is defeatedOut, it overcomes and carries out acoustic training model using a large amount of data in the prior art to realize the conversion of voice to song, and mostEnd form at song in do not include the sound of user oneself, the problem for causing user's participation and Experience Degree not high realizes nothingAcoustic training model need to be carried out, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realizedFruit, meanwhile, also ensure that song has good acoustical quality.
On the basis of the various embodiments described above, further, in the voice signal corresponding with song for obtaining user's typingFurther include before:
Standard acoustic feature information of the acoustic feature information of multiple songs of recording as corresponding song is extracted respectively;
By the identification information of multiple songs with corresponding standard acoustic feature information preservation in acoustic feature template.
In the present embodiment, acoustic feature template is that the number of songs recorded previously according to professional singer obtain.Specifically,The acoustic feature information of each song can be extracted respectively, due to each sound after the number of songs for getting professional singer recordingLearning the corresponding each song of characteristic information is recorded by professional singer, therefore, each acoustic feature information that can will be extractedStandard acoustic feature information as corresponding song.
If only by each standard acoustic feature information preservation extracted in acoustic feature template, then from preparatoryIn the acoustic feature template of foundation obtain standard acoustic feature information corresponding with some particular songs when, lack acquisition according toAccording to.Based on this, can be obtained each correspondingly with each standard acoustic feature while extracting each standard acoustic feature informationThe identification information of song, and by the identification information of each song with corresponding standard acoustic feature information preservation in acoustic feature templateIn.Wherein, the identification information of song includes the title of song, and the lyrics of song, the title of song add name of professional singer etc.,The mode that song generates the identification information that terminal obtains song corresponding with the voice signal of user's typing can be reception and useThe input information at family is also possible to extract from the voice signal got.
Embodiment two
Fig. 2 is a kind of flow chart of song generation method provided by Embodiment 2 of the present invention.The present embodiment is in above-mentioned each realityIt is optional that acoustic feature information is carried out more to the voice signal according to the standard acoustic feature information on the basis of applying exampleNewly, including:The corresponding duration information of the voice signal is obtained, according to the duration information and the standard acoustic feature informationTime-domain audio transformation is carried out to the voice signal, to change the acoustic feature information of the voice signal;Correspondingly, will haveThe voice signal of updated acoustic feature information is stored or is exported as targeted voice signal, including:It will carry out time domainThe voice signal obtained after audio transformation is stored or is exported as targeted voice signal.As shown in Fig. 2, the side of the present embodimentMethod specifically includes:
S210, the voice signal corresponding with song for obtaining user's typing.
S220, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established.
S230, the corresponding duration information of voice signal is obtained, according to duration information and standard acoustic feature information to voiceSignal carries out time-domain audio transformation, to change the acoustic feature information of voice signal.
Wherein, voice signal can be the waveform changed over time, for each word, word or the phrase in voice signalCorresponding one section of waveform is corresponded to, whens every section of waveform has its corresponding time starting point, time terminating point and time span etc.Between information, above-mentioned each word, word or phrase and temporal information corresponding with each word, word or phrase are that voice signal is correspondingDuration information.
It, can be according to duration information and standard acoustic feature information pair after getting the corresponding duration information of voice signalVoice signal carries out time-domain audio transformation, to change the acoustic feature information of voice signal.Specifically, can be believed based on the duration of a soundBreath carries out time-domain audio transformation to the corresponding waveform of the voice signal using standard acoustic feature information, so that time-domain audio becomesThe duration information of waveform corresponding to the voice signal, fundamental frequency information and energy information can be believed with standard acoustic feature respectively after changingStandard duration information, normal fundamental frequency information and standard energy information in breath match.Aforesaid operations are by the acoustic feature of standardBenchmark of the information as adjustment voice signal, is adjusted the acoustic feature information of voice signal, to change voice signalAcoustic feature information.
Preferably, the corresponding duration information of voice signal is obtained, may include:
The lyrics information for including in voice signal is obtained by speech recognition, it is corresponding to obtain voice signal according to lyrics informationDuration information.
Specifically, after the voice signal for getting user's typing voice signal can be obtained by audio recognition methodIn lyrics information, wherein comprising word, word or phrase etc. in the lyrics information, each word, word or phrase have its it is corresponding whenBetween information.The corresponding duration information of voice signal can be obtained according to the lyrics information.
S240, obtained voice signal will be carried out after time-domain audio transformation store as targeted voice signal or defeatedOut.
The voice signal obtained after above-mentioned carry out time-domain audio transformation both may include the sound characteristic of user oneself, may be used alsoWith the acoustic feature information comprising professional singer, it is based on this, the voice signal obtained after time-domain audio converts can be madeIt is stored or is exported for targeted voice signal.
Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, fromThe corresponding standard acoustic feature information of song is obtained in the acoustic feature template pre-established, obtains the corresponding duration of a sound of voice signalInformation carries out time-domain audio transformation to voice signal according to duration information and standard acoustic feature information, to change voice signalAcoustic feature information, obtained voice signal is stored or is exported as targeted voice signal after carrying out time domain transformation,It overcomes and carries out acoustic training model using a large amount of data to realize the conversion of voice to song, and most end form in the prior artAt song in do not include user oneself sound, the problem for causing user's participation and Experience Degree not high, realize without intoRow acoustic training model can be realized in the time domain and the voice of user is converted to the song for remaining with the sound of user oneselfEffect, meanwhile, also ensure that song has good acoustical quality.
On the basis of the various embodiments described above, further, according to duration information and standard acoustic feature information to voiceSignal carries out time-domain audio transformation, to change the acoustic feature information of voice signal, including:
Voice signal is subjected to tone division according to duration information, is believed according to the normal fundamental frequency in standard acoustic feature informationVoice signal after breath, standard duration information and standard energy information divide tone carries out time-domain audio transformation so that through whenThe fundamental frequency information of the transformed voice signal of domain audio is consistent with normal fundamental frequency information, through the transformed voice signal of time-domain audioThe energy information consistent with standard duration information and through the transformed voice signal of time-domain audio of duration information and standard energyIt is consistent to measure information.
Wherein, acoustic feature information may include fundamental frequency information, duration information and the energy information of voice signal.Wherein,Fundamental frequency information corresponds to the pitch of voice signal, and duration information corresponds to the rhythm of voice signal, and energy information corresponds to voice signalVolume.
In the present embodiment, voice signal can be carried out by tone division according to the duration information of voice signal, it is preferred that canWith according in duration information each word and the corresponding temporal information of each word by voice signal carry out tone division, obtain withThe corresponding tone of each word, each tone correspond to a part of voice signal, such as the song for the lyrics comprising 100 wordsSong, the 1st corresponding temporal information of word are t1b-t1n, and the 2nd corresponding temporal information of word is t2b-t2n ... ..., the 100thThe corresponding temporal information of word is t100b-t100n, then t1b-t1n period corresponding part signal is the 1st in voice signalThe tone of a word, t2b-t2n period corresponding part signal is the tone ... ... of the 2nd word, voice signal in voice signalMiddle t100b-t100n period corresponding part signal is the tone of the 100th word.Wherein, each tone has its correspondingFundamental frequency information, duration information and energy information.Later the standard in standard acoustic feature information can be utilized as unit of toneFundamental frequency information, standard duration information and standard energy information carry out time-domain audio change to the voice signal after tone dividesChange so that the fundamental frequency information through the transformed voice signal of time-domain audio and corresponding normal fundamental frequency information it is consistent, through time domain soundFrequently the duration information of transformed voice signal is consistent with corresponding standard duration information and through the transformed language of time-domain audioThe energy information of sound signal is consistent with corresponding standard energy information.That is, for song in the standard acoustic feature information of songEach word of the bent lyrics, all preserves fundamental frequency information, duration information and the energy information of its corresponding tone, for through time domainEach tone of the transformed voice signal of audio, fundamental frequency information, duration information and the energy information of the tone respectively with standardNormal fundamental frequency information, the standard duration information that tone is corresponded in acoustic feature information are consistent with standard energy information.
Embodiment three
Fig. 3 is a kind of flow chart for song generation method that the embodiment of the present invention three provides.The present embodiment is in above-mentioned each realityOn the basis of applying example, after being selected in the voice signal corresponding with song for obtaining user's typing, special according to the standard acousticBefore reference breath carries out the update of acoustic feature information to the voice signal, further include:Extract the frequency spectrum of the voice signalInformation;The update of acoustic feature information is carried out to the voice signal according to the standard acoustic feature information, including:Obtain instituteThe voice signal is carried out tone division according to the duration information by the corresponding duration information of predicate sound signal;Tone is drawnThe voice signal after point carries out the conversion of time domain to frequency domain, according to the standard acoustic feature information to obtaining after conversionThe acoustic feature information of voice signal on frequency domain is updated;Correspondingly, by the language with updated acoustic feature informationSound signal is stored or is exported as targeted voice signal, including:According to the acoustic feature information that is obtained after update and describedSpectrum information obtains targeted voice signal, and targeted voice signal is stored or exported.As shown in figure 3, the side of the present embodimentMethod specifically includes:
S310, the voice signal corresponding with song for obtaining user's typing.
S320, the spectrum information for extracting voice signal.
In the present embodiment, the spectrum information of voice signal corresponds to the tone color of voice signal, and that reflects the sound of user spiesSign.During converting voice signals into song, in order to retain the sound characteristic of user, so that the song tool ultimately generatedThere is the sound characteristic of user oneself, can in advance extract the spectrum information in voice signal.
S330, the corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established.
S340, the corresponding duration information of voice signal is obtained, voice signal is carried out by tone division according to duration information;It willVoice signal after tone divides carries out the conversion of time domain to frequency domain, according to standard acoustic feature information to the frequency obtained after conversionThe acoustic feature information of voice signal on domain is updated.
Method acquisition voice signal corresponding duration information described according to the above embodiments and divide tone, canTo get its corresponding duration information according to the waveform of voice signal and voice content in the time domain, according to duration information to languageSound signal carries out tone division.
It, can also be on frequency domain to voice other than carrying out the update of acoustic feature information to voice signal in the time domainThe update of signal progress acoustic feature information.Specifically, can be as unit of each tone of division, it will be after tone dividesVoice signal carry out time domain to the conversion of frequency domain, obtain representation of each tone on frequency domain.Existed according to each toneRepresentation on frequency domain determines the acoustic feature information of the voice signal on frequency domain, and according to standard acoustic feature information pairThe acoustic feature information of the voice signal on frequency domain obtained after conversion is updated, and obtains updated acoustic feature information.
S350, targeted voice signal is obtained according to the acoustic feature information and spectrum information obtained after update, by target languageSound signal is stored or is exported.
Above-mentioned updated acoustic feature information is the acoustic feature information of professional singer, and spectrum information reflection is userSound characteristic, both include user using the targeted voice signal that the acoustic feature information and spectrum information that obtain after update obtainSound characteristic, and the acoustic feature information including professional singer.It, can be by the target voice after obtaining targeted voice signalSignal is stored or is exported.
Song generation method provided in this embodiment, by obtaining the voice signal corresponding with song of user's typing, andThe spectrum signature for extracting voice signal obtains the corresponding standard acoustic feature letter of song from the acoustic feature template pre-establishedBreath obtains the corresponding duration information of voice signal, voice signal is carried out tone division according to duration information, after tone is dividedVoice signal carry out time domain to frequency domain conversion, according to standard acoustic feature information to the voice on the frequency domain obtained after conversionThe acoustic feature information of signal is updated, and finally obtains target according to the acoustic feature information and spectrum information obtained after updateTargeted voice signal is stored or is exported by voice signal, is overcome and is carried out acoustics using a large amount of data in the prior artModel training causes to use in finally formed song to realize the conversion of voice to song not comprising the sound of user oneselfFamily participation and the not high problem of Experience Degree, realize without carrying out acoustic training model, can realize on frequency domain by userVoice be converted to the effect for remaining with the song of sound of user oneself, meanwhile, also ensure song have good sound qualityEffect.
On the basis of the various embodiments described above, further, according to standard acoustic feature information to the frequency obtained after conversionThe acoustic feature information of voice signal on domain is updated, including:
Use the voice signal on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature informationFundamental frequency information, use the voice letter after the standard duration information replacement conversion in standard acoustic feature information on obtained frequency domainNumber duration information, use the voice after the standard energy information replacement conversion in standard acoustic feature information on obtained frequency domainThe energy information of signal.
In the present embodiment, as unit of each tone, according to representation of each tone on frequency domain, determine on frequency domainVoice signal acoustic feature information, include fundamental frequency information, duration information and the energy of voice signal in the acoustic feature informationMeasure information.Later, the voice on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature information is utilizedThe fundamental frequency information of signal uses the language on the frequency domain obtained after the standard duration information replacement conversion in standard acoustic feature informationThe duration information of sound signal, using on the frequency domain obtained after the standard energy information replacement conversion in standard acoustic feature informationThe energy information of voice signal.
On the basis of the various embodiments described above, further, believed according to the acoustic feature information and frequency spectrum obtained after updateBreath obtains targeted voice signal, including:
The acoustic feature information and spectrum information that obtain after update are inputed into vocoder, obtain the mesh that vocoder restoresPoster sound signal.
Wherein, the acoustic feature information updated on frequency domain and the spectrum information got in advance, Wu Fazhi are utilizedIt connects to obtain corresponding targeted voice signal.It is preferred, therefore, that targeted voice signal can be restored by vocoder.Wherein, soundCode device is also referred to as speech analysis synthesis system or voice band compressibility, can use the model parameter and knot of voice signalIt closes speech synthesis technique and restores corresponding voice signal, be the coder that a kind of pair of voice is analyzed and synthesized.
In the present embodiment, the acoustic feature information obtained after update and the spectrum information being obtained ahead of time can be input to soundIn code device, vocoder is according to each parameter of input, and the speech synthesis technique for combining it internal, restores corresponding target voiceSignal.
Example IV
Fig. 4 is the structural schematic diagram of one of embodiment of the present invention four song generating means.As shown in figure 4, this implementationExample song generating means include:
Voice signal obtains module 410, for obtaining the voice signal corresponding with song of user's typing;
Acoustic feature information updating module 420, it is corresponding for obtaining song from the acoustic feature template pre-establishedStandard acoustic feature information carries out the update of acoustic feature information according to standard acoustic feature information to voice signal;Wherein, soundLearn the standard acoustic feature information that at least one song is preserved in feature templates;
Targeted voice signal determining module 430, for that will have the voice signal conduct of updated acoustic feature informationTargeted voice signal is stored or is exported.
Song generating means provided in this embodiment, by voice signal obtain module obtain user's typing with song pairIt is corresponding to obtain song using acoustic feature information updating module from the acoustic feature template pre-established for the voice signal answeredStandard acoustic feature information carries out the update of acoustic feature information according to standard acoustic feature information, wherein sound to voice signalThe standard acoustic feature information for preserving at least one song in feature templates is learned, recycles targeted voice signal determining module willVoice signal with updated acoustic feature information is stored or is exported as targeted voice signal, and existing skill is overcomeAcoustic training model is carried out to realize the conversion of voice to song using a large amount of data in art, and in finally formed song notSound comprising user oneself, the problem for causing user's participation and Experience Degree not high are realized without carrying out acoustic model instructionPractice, the effect that the voice of user is converted to the song of sound for remaining with user oneself can be realized, meanwhile, also ensure songSound has good acoustical quality.
On the basis of the various embodiments described above, further, acoustic feature information updating module 420 may include:
First duration information acquiring unit, for obtaining the corresponding duration information of voice signal;
Time-domain audio converter unit, for carrying out time domain to voice signal according to duration information and standard acoustic feature informationAudio transformation, to change the acoustic feature information of voice signal;
Targeted voice signal determining module 430 can specifically include:
First object voice signal determination unit, the voice signal for that will obtain after time-domain audio transformation is as meshPoster sound signal is stored or is exported.
Further, time-domain audio converter unit specifically can be used for:
Voice signal is subjected to tone division according to duration information, is believed according to the normal fundamental frequency in standard acoustic feature informationVoice signal after breath, standard duration information and standard energy information divide tone carries out time-domain audio transformation so that through whenThe fundamental frequency information of the transformed voice signal of domain audio is consistent with normal fundamental frequency information, through the transformed voice signal of time-domain audioThe energy information consistent with standard duration information and through the transformed voice signal of time-domain audio of duration information and standard energyIt is consistent to measure information.
Further, which can also include:
Spectrum information extraction module, for obtain user's typing voice signal corresponding with song after, according to markBefore quasi- acoustic feature information carries out the update of acoustic feature information to voice signal, the spectrum information of voice signal is extracted;
Acoustic feature information updating module 420 can also include:
Second duration information acquiring unit, for obtaining the corresponding duration information of voice signal;
Frequency domain audio converter unit, for voice signal to be carried out tone division according to duration information;After tone is dividedVoice signal carry out time domain to frequency domain conversion, according to standard acoustic feature information to the voice on the frequency domain obtained after conversionThe acoustic feature information of signal is updated;
Targeted voice signal determining module 430 specifically can also include:
Second target voice determination unit, for obtaining mesh according to the acoustic feature information and spectrum information obtained after updateTargeted voice signal is stored or is exported by poster sound signal.
Further, frequency domain audio converter unit specifically can be used for:
Use the voice signal on the frequency domain obtained after the normal fundamental frequency information replacement conversion in standard acoustic feature informationFundamental frequency information, use the voice letter after the standard duration information replacement conversion in standard acoustic feature information on obtained frequency domainNumber duration information, use the voice after the standard energy information replacement conversion in standard acoustic feature information on obtained frequency domainThe energy information of signal.
Further, the second target voice determination unit specifically can be used for:
The acoustic feature information and spectrum information that obtain after update are inputed into vocoder, obtain the mesh that vocoder restoresPoster sound signal.
Further, the first duration information acquiring unit and the second duration information acquiring unit specifically may be incorporated for:
The lyrics information for including in voice signal is obtained by speech recognition, it is corresponding to obtain voice signal according to lyrics informationDuration information;
Further, which can also include:
Standard acoustic feature information extraction modules, for obtain user's typing voice signal corresponding with song itBefore, standard acoustic feature information of the acoustic feature information of multiple songs of recording as corresponding song is extracted respectively;
Acoustic feature template generation module, for by the identification information of multiple songs and corresponding standard acoustic feature informationIt is stored in acoustic feature template.
It is raw that song provided by any embodiment of the invention can be performed in song generating means provided by the embodiment of the present inventionAt method, have the corresponding functional module of execution method and beneficial effect.
Embodiment five
Fig. 5 is the structural schematic diagram that the song that the embodiment of the present invention five provides generates terminal.Fig. 5, which is shown, to be suitable for being used in factThe exemplary song of existing embodiment of the present invention generates the block diagram of terminal 512.The song that Fig. 5 is shown generates terminal 512One example, should not function to the embodiment of the present invention and use scope bring any restrictions.
It is showed in the form of universal computing device as shown in figure 5, song generates terminal 512.The group of song generation terminal 512Part can include but is not limited to:One or more processor 516, memory 528 connect (including the storage of different system componentsDevice 528 and processor 516) bus 518.
Bus 518 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It liftsFor example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Song generates terminal 512 and typically comprises a variety of computer system readable media.These media can be any energyEnough usable mediums accessed by song generation terminal 512, including volatile and non-volatile media, it is moveable and irremovableMedium.
Memory 528 may include the computer system readable media of form of volatile memory, such as arbitrary access is depositedReservoir (RAM) 530 and/or cache memory 532.Song generate terminal 512 may further include it is other it is removable/noMovably, volatile/non-volatile computer system storage medium.Only as an example, storage device 534 can be used for reading and writingImmovable, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").It, can although being not shown in Fig. 5To provide the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk "), and it is non-volatile to movingProperty CD (such as CD-ROM, DVD-ROM or other optical mediums) read and write CD drive.In these cases, each driveDynamic device can be connected by one or more data media interfaces with bus 518.Memory 528 may include at least one journeySequence product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform this hairThe function of bright each embodiment.
Program/utility 540 with one group of (at least one) program module 542, can store in such as memoryIn 528, such program module 542 includes but is not limited to operating system, one or more application program, other program modulesAnd program data, it may include the realization of network environment in each of these examples or certain combination.Program module 542Usually execute the function and/or method in embodiment described in the invention.
Song generates terminal 512 can also be with one or more external equipments 514 (such as keyboard, sensing equipment, display524 etc., wherein display 524 can decide whether to configure according to actual needs) it communicates, can also to use with one or moreFamily can generate the equipment that interact of terminal 512 with the song and communicate, and/or with enable song generation terminal 512 and one orA number of other any equipment (such as network interface card, modem etc.) communications for calculating equipment and being communicated.This communication can be withIt is carried out by input/output (I/O) interface 522.Also, song, which generates terminal 512, can also pass through network adapter 520 and oneA or multiple networks (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.Such as figureShown, network adapter 520 is communicated by bus 518 with other modules that song generates terminal 512.Although should be understood that Fig. 5In be not shown, can in conjunction with song generate terminal 512 use other hardware and/or software module, including but not limited to:Micro- generationCode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and data backup are depositedStorage device etc..
The program that processor 516 is stored in memory 528 by operation, thereby executing various function application and dataProcessing, such as realize song generation method provided by any embodiment of the invention.
Embodiment six
The embodiment of the present invention six additionally provides a kind of computer readable storage medium, is stored thereon with computer program, shouldRealize that the song generation method as provided by the embodiment of the present invention, this method include when program is executed by processor:
Obtain the voice signal corresponding with song of user's typing;
The corresponding standard acoustic feature information of song is obtained from the acoustic feature template pre-established, according to standard acousticCharacteristic information carries out the update of acoustic feature information to voice signal;Wherein, at least one song is preserved in acoustic feature templateBent standard acoustic feature information;
Voice signal with updated acoustic feature information is stored or exported as targeted voice signal.
Certainly, a kind of computer readable storage medium provided by the embodiment of the present invention, the computer program stored thereonThe method operation being not limited to the described above, can also be performed the phase in song generation method provided by any embodiment of the inventionClose operation.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable mediaCombination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readableStorage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device orDevice, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes:ToolThere are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storageMedium can be any tangible medium for including or store program, which can be commanded execution system, device or deviceUsing or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimitedIn electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer canAny computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used forBy the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimitedIn wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereofProgram code, described program design language include object oriented program language-such as Java, Smalltalk, C++,It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be withIt fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portionDivide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.?Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) orWide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet serviceIt is connected for quotient by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art thatThe invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present inventionIt is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, alsoIt may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.