Content of the invention
The invention provides a kind of sound control method, device and projector apparatus, at least to solve to exist in correlation techniqueManual operation projector when complex operation, lead to the problem of poor user experience.
According to an aspect of the invention, it is provided a kind of sound control method, comprising: determine that projector apparatus enter languageSound identifies state, and wherein, described speech recognition state is the state according to phonetic order execution operation;The language of receives inputSound instructs;Described phonetic order according to receiving executes operation corresponding with described phonetic order.
Optionally it is determined that projector apparatus enter speech recognition state comprises determining that described projector apparatus are called out by receptionThe mode of awake instruction, enters described speech recognition state, and wherein, described wake-up instruction includes at least one of: makes a reservation forThe touching signals of track, voice signal, push button signalling.
Alternatively, included according to the described phonetic order execution operation corresponding with described phonetic order receiving: judge whetherIt is previously stored with the instruction mated with described phonetic order;In the case of being to be in judged result, execution is referred to described voiceMake corresponding operation.
Alternatively, before according to the described phonetic order execution operation corresponding with described phonetic order receiving, also include:Obtain the Apply Names of the file name of file prestoring and/or preassembled application;Store described file nameAnd/or described Apply Names, wherein, described file name is used for being called and described file name according to described phonetic orderCorresponding file, described Apply Names is used for calling application corresponding with described Apply Names according to described phonetic order.
Alternatively, described projector apparatus are supported to receive described phonetic order by ancillary equipment, and wherein, described periphery setsStandby inclusion at least one of: wired earphone, bluetooth earphone.
According to a further aspect in the invention, there is provided a kind of phonetic controller, comprising: determining module, throw for determiningShadow instrument equipment enters speech recognition state, and wherein, described speech recognition state is the state according to phonetic order execution operation;Receiver module, for the phonetic order of receives input;Performing module, for according to the described phonetic order execution receiving withThe corresponding operation of described phonetic order.
Alternatively, described determining module comprises determining that unit, for determine described projector apparatus by receive wake-up refer toThe mode of order, enters described speech recognition state, and wherein, described wake-up instruction includes at least one of: desired trajectoryTouching signals, voice signal, push button signalling.
Alternatively, described performing module includes: judging unit, is used for judging whether to be previously stored with and described phonetic orderThe instruction of coupling;Performance element, for the judged result in described judging unit for, in the case of being, executing and institute's predicateSound instructs corresponding operation.
Alternatively, described device also includes: acquisition module, for obtain the file name of file prestoring and/orThe Apply Names of preassembled application;Memory module, for storing described file name and/or described Apply Names,Wherein, described file name is used for calling file corresponding with described file name, described application according to described phonetic orderTitle is used for calling application corresponding with described Apply Names according to described phonetic order.
Alternatively, described projector apparatus are supported to receive described phonetic order by ancillary equipment, and wherein, described periphery setsStandby inclusion at least one of: wired earphone, bluetooth earphone.
According to a further aspect in the invention, there is provided a kind of projector apparatus, described equipment at least includes: low-power consumption wakes upChip, speech engine and normal stream assembly, wherein, described low-power consumption wakes up chip and is used for entering voice according to wake-up instructionIdentification state, wherein, described speech recognition state is the state according to phonetic order execution operation;Described speech engine is usedPhonetic order in receives input;Described normal stream assembly is used for according to the described phonetic order execution receiving and described voiceInstruct corresponding operation.
By the present invention, using determining projector apparatus entrance speech recognition state, wherein, described speech recognition state isState according to phonetic order execution operation;The phonetic order of receives input;According to the described phonetic order execution receiving withThe corresponding operation of described phonetic order, complex operation when solving manual operation projector present in correlation technique, lead toPoor user experience, and then reached reduction projector operation complexity, improve the effect of Consumer's Experience.
Specific embodiment
To describe the present invention in detail below with reference to accompanying drawing and in conjunction with the embodiments.It should be noted that in the feelings do not conflictedUnder condition, the embodiment in the application and the feature in embodiment can be mutually combined.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, " second "Etc. being for distinguishing similar object, without for describing specific order or precedence.
Provide a kind of sound control method in the present embodiment, Fig. 1 is sound control method according to embodiments of the present inventionFlow chart, as shown in figure 1, this flow process comprises the steps:
Step s102, determines that projector apparatus enter speech recognition state, wherein, this speech recognition state is according to languageThe state of sound instruction execution operation;
Step s104, the phonetic order of receives input;
Step s106, executes operation corresponding with above-mentioned phonetic order according to the phonetic order receiving.
By above-mentioned steps, when operating projector apparatus, projector apparatus can be operated by phonetic order, thus canTo avoid manual tedious steps, complex operation when solving manual operation projector present in correlation technique, leadCause poor user experience, and then reached reduction projector operation complexity, improve the effect of Consumer's Experience.
In an optional embodiment, determine that projector apparatus enter speech recognition state and comprise determining that this projector setsBy way of the standby wake-up instruction by reception, enter above-mentioned speech recognition state, wherein, below this wake-up instruction inclusion at leastOne of: the touching signals of desired trajectory, voice signal, push button signalling.
In an optional embodiment, the above-mentioned phonetic order execution operation corresponding with phonetic order according to receiving includes:Judge whether to be previously stored with the instruction mated with above-mentioned phonetic order;In the case of being to be in judged result, execute and be somebody's turn to doThe corresponding operation of phonetic order.Wherein, if being not stored in the instruction of above-mentioned phonetic order coupling, one can be fed backInformation, the such as prompting of " this instruction of None- identified ".
In an optional embodiment, before the above-mentioned phonetic order execution operation corresponding with phonetic order receiving,Also include: obtain the Apply Names of the file name of file prestoring and/or preassembled application;Storage this articlePart title and/or Apply Names, wherein, this document title is used for calling literary composition corresponding with file name according to phonetic orderPart, this Apply Names is used for calling application corresponding with Apply Names according to phonetic order.Store above-mentioned file name and answerTo easily corresponding file and application be called according to phonetic order with the purpose of title, when store new file orThe Apply Names of the file name of file of this new storage and the application of this new installation after being mounted with new application, can be stored.
In an optional embodiment, above-mentioned projector apparatus are supported to receive above-mentioned phonetic order by ancillary equipment, itsIn, this ancillary equipment includes at least one of: wired earphone, bluetooth earphone.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-described embodimentMethod can realize by the mode of software plus necessary general hardware platform naturally it is also possible to pass through hardware, but a lotIn the case of the former is more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existingHave what technology contributed partly can embody in the form of software product, this computer software product is stored in oneIn storage medium (as rom/ram, magnetic disc, CD), including some instructions with so that a station terminal equipment (canTo be mobile phone, computer, server, or the network equipment etc.) method described in execution each embodiment of the present invention.
Additionally provide a kind of phonetic controller in the present embodiment, this device is used for realizing above-described embodiment and is preferable to carry outMode, had carried out repeating no more of explanation.As used below, predetermined function can be realized in term " module "Software and/or hardware combination.Although the device described by following examples preferably to be realized with software, firmlyPart, or the realization of the combination of software and hardware is also may and to be contemplated.
Fig. 2 is the structured flowchart of phonetic controller according to embodiments of the present invention, as shown in Fig. 2 this device includes reallyCover half block 22, receiver module 24 and performing module 26, illustrate to this device below.
Determining module 22, for determining projector apparatus entrance speech recognition state, wherein, this speech recognition state is rootState according to phonetic order execution operation;Receiver module 24, connects to above-mentioned determining module 22, for receives inputPhonetic order;Performing module 26, connects to above-mentioned receiver module 24, for according to the phonetic order execution receiving withState the corresponding operation of phonetic order.
Fig. 3 is the structured flowchart of determining module 22 in phonetic controller according to embodiments of the present invention, as shown in figure 3,This determining module 22 includes determining unit 32, below this determining module 22 is illustrated.
Determining unit 32, for determining projector apparatus by way of receiving and waking up instruction, entrance speech recognition state,Wherein, this wake-up instruction includes at least one of: the touching signals of desired trajectory, voice signal, push button signalling.
Fig. 4 is the structured flowchart of performing module 26 in phonetic controller according to embodiments of the present invention, as shown in figure 4,This performing module 26 includes judging unit 42 and performance element 44, below this performing module 26 is illustrated:
Judging unit 42, for judging whether to be previously stored with the instruction mated with above-mentioned phonetic order;Performance element 44,Connect to above-mentioned judging unit 42, for the judged result in above-mentioned judging unit 42 in the case of being, executing and being somebody's turn to doThe corresponding operation of phonetic order.
Fig. 5 is the preferred structure block diagram of phonetic controller according to embodiments of the present invention, as shown in figure 5, this device removesOutside including all modules shown in Fig. 2, also include acquisition module 52 and memory module 54, below this device is saidBright:
Acquisition module 52, for obtaining the file name of file prestoring and/or the application name of preassembled applicationClaim;Memory module 54, connects to above-mentioned acquisition module 52 and above-mentioned performing module 26, for storing above-mentioned file nameAnd/or above-mentioned Apply Names, wherein, this document title is used for calling file corresponding with file name according to phonetic order,This Apply Names is used for calling application corresponding with Apply Names according to phonetic order.
Alternatively, above-mentioned projector apparatus are supported to receive phonetic order, wherein, this ancillary equipment bag by ancillary equipmentInclude at least one of: wired earphone, bluetooth earphone.
According to a further aspect in the invention, a kind of projector apparatus are additionally provided, this equipment at least includes: low-power consumption wakes upChip, speech engine and normal stream assembly, wherein, this low-power consumption wakes up chip and is used for being known according to wake-up instruction entrance voiceOther state, wherein, this speech recognition state is the state according to phonetic order execution operation;This speech engine is used for receivingThe phonetic order of input;This normal stream assembly is used for executing operation corresponding with this phonetic order according to the phonetic order receiving.Wherein, above-mentioned low-power consumption wakes up chip and can connect with speech engine, and this speech engine can connect with normal stream assembly,Low-power consumption wakes up and can connect it is also possible to be not connected between chip and normal stream assembly.
In embodiments of the present invention, involved technology can comprise the following aspects:
1st, speech recognition technology:
Speech recognition technology, as current hot technology, has penetrated into every field, opens from " keyboard mutuality ", " touchesThe interactive mode of " interactive voice " is arrived in control interaction ", is that people's liberation both hands bring possibility with improving efficiency.
Speech recognition technology is also referred to as automatic speech recognition (automatic speech recognition, referred to as asr),Its target be by the vocabulary Content Transformation in the voice of the mankind be computer-readable input, such as button, binary codingOr character string.Different from Speaker Identification (speaker recognition) and speaker verification, the latter attempts identificationOr confirm to send speaker rather than the vocabulary content included in it of voice.Speech recognition technology is exactly to allow machine to pass through to knowOther and understanding process is changed into the high-tech of corresponding text or order voice signal.Speech recognition technology mainly includes spyLevy extractive technique, pattern match criterion and three aspects of model training technology.
Different according to the object of identification, voice recognition tasks substantially can be divided into 3 classes, i.e. isolated word recognition (isolated wordRecognition), key word identification (or claiming keyword spotting, keyword spotting) and continuous speech recognition.Wherein,The task of isolated word recognition is the previously known isolated word of identification, such as " start ", " shutdown " etc.;Continuous speech recognitionTask be then to identify arbitrary continuous speech, a such as sentence or one section of word;Keyword detection pin in continuous speech streamTo be continuous speech, but it and the whole word of nonrecognition, and simply detect that known some key words wherein occur,As detection " computer ", " world " this two words in one section of word.
The speech recognition of isolated word can be adopted in embodiments of the present invention, the phonetic order supported will be needed to edit in advanceBecome grammar file, have engine compiling to generate corresponding identification range.User only supports to pre-define in grammer when usingInstruction.
2nd, low-power consumption wakes up:
Low power consumption digital signal processor (digital signal processor, referred to as dsp) voice awakening technology refers to(i.e. central processing unit (central after terminal (e.g., mobile phone) radio access point (access point, referred to as ap) dormancyProcessing unit, referred to as cpu) quit work), rely on the distinctive processing unit of dsp, and by specificTriggering mode, can reach wake-up cpu so that the technology of its rearming.It is to be conceived to completely to solvePut the speech control scene of both hands, on the basis of reaching maximum economize on electricity in cell phone system resting state, exploitation is to mobile phone languageThe technical operation that sound wakes up.The development of this research can be opened up one kind for mobile phone operation and completely be used " voice+listenFeel reaction " the input operation premise of replacement " finger+vision touch-control ", thus reaching the man-machine of completely voice-intelligentInteractive experience.
3rd, barge:
Barge refers to carry out a specific human voices technology of identification of speech recognition under stationary background sound.There is this work(Can, could talk using after just need not waiting " ticking " sound during speech recognition system, but can be beaten with voice at any timeDisconnected prompt tone, is directly entered speech recognition (this process is referred to as barge-in).
The key of barge is speech terminals detection function, and the purpose of end-point detection is under complicated applied environmentTell voice signal and non-speech audio in signal stream, and determine beginning and the end of voice signal.General signalAll there is certain background sound in stream, and the model of speech recognition is all based on voice signal training, voice signal and voiceIt is just meaningful that model carries out pattern match.Therefore detect from signal stream voice signal be speech recognition necessary pre- fromReason process.
In detail, end-point detection has two processes:
A) feature based on voice signal, with the parameters such as energy, zero-crossing rate, business (entropy), pitch (pitch) withAnd their derivative parameter, to judge the speech/non-speech signal in signal stream.
B), after voice signal is detected in signal stream, judge it is whether beginning or the end point of sentence herein.In commercial languageIn system for electrical teaching, it is easier to make in sentence, there is pause (non-voice) due to the changeable background of signal and natural dialogue pattern, specialIt is not always to have silence gap before outburst initial consonant.Therefore, the judgement of this beginning/end is particularly important.
In addition the purpose of end-point detection also resides in:
A) reduce the data processing amount of evaluator: the computing load of working transmission and evaluator can be reduced in a large number, forThe Real time identification of voice dialogue plays an important role.
B) refuse the signal of non-voice: the identification to non-speech audio is not only a kind of wasting of resources, and be possible to changeThe state of dialogue, causes the puzzlement to user.
C) in the system needing to interrupt (barge-in) function, the starting point of voice is necessary.Find in end-point detectionDuring the starting point of voice, system will stop the broadcasting of prompt tone.Complete to interrupt function.
The technical scheme of this system is as follows:
During device sleeps, user passes through to wake up wake instruction projector, enters speech recognition state, and this wake-up instruction is supportedSelf-defined recording is trained.
Wherein, user also can wake up projector manually, such as presses wake-up device by home bond distance, enters speech recognition shapeState.
Immediately, user can say preset any phonetic order, tells projector next step needs what does.As: beatOpen projection, close projection, play * * * *, (wherein * * * is video file name, ppt document name or installation to open * * *Application name etc.).Automatically this title can be loaded into and can say grammer as long as file copies projector memorizer to, applicationAs long as the system that is installed to also can be automatically loaded into can say grammer.
Wherein, the projector not preset instruction when user input, projector can point out user input mistake, enters againInput instruction flow process.
When video commences play out, user can be by barge technology whole process Voice command video playback, you can in videoWhenever phonetic order is inputted during broadcasting.User can say video control instruction such as: heighten volume, turn down volume, temporarilyStop, continue to play, exit broadcasting etc..
When ppt starts to demonstrate, user can be play by barge technology whole process Voice command ppt, you can in pptWhenever phonetic order is inputted during demonstration.User can say ppt control instruction such as: page up, lower one page, homepage,Endpage, exit full frame, played in full screen etc..
Support ancillary equipment Voice command projector.Ancillary equipment such as wired earphone, bluetooth earphone, after connecting projector,Ancillary equipment can control projector as voice-input device.Pass through indigo plant as user may stand in the place farther out from projectorTooth earphone voice control projection instrument.
Whole flow process projector has user interface (user interface, referred to as ui) prompting on projecting apparatus screen,Have voice simultaneously or prompt tone tells when user starts input instruction, end of input, input error etc..
Below with reference to accompanying drawing to the embodiment of the present invention scheme more illustrated in detail.
Fig. 6 is the structured flowchart of voice control projection instrument system according to embodiments of the present invention, as shown in Figure 6.This system is mainBe made up of 3 parts, including low-power consumption wake up chip module (corresponding to the low-power wakeup dsp chip in Fig. 6,Wake up chip with above-mentioned low-power consumption), identification and report engine modules (corresponding to the voice engine in Fig. 6, ibidThe speech engine stated) and normal stream assembly module (corresponding to the standard flow component in Fig. 6, with above-mentionedNormal stream assembly).The major function of each module is as follows:
Low-power consumption wakes up chip module, belongs to hardware device, for monitoring the wake operation of user in projector dormancy;Identification and report engine modules, are the nucleus modules that speech recognition and voice are reported, and are responsible for the audio frequency collected is knownNot, and voice synthesized broadcast content;Normal stream assembly module, is used for realizing each concrete function point, such as video playback languageSound control system, opens application Voice command, and each function point exists in the form of streaming, has the life cycle of oneself.
Fig. 7 is that the low-power consumption of voice control projection instrument system according to embodiments of the present invention wakes up flow chart, as shown in fig. 7, shouldFlow process comprises the steps:
Step s702, user input wakes up word;
Step s704, low-power consumption wakes up chip and persistently monitors user speech input in projector dormancy;
Step s706, when the phonetic entry of user is consistent in the wake-up word of preset training, low-power consumption wakes up chip and wakes upCpu, and report wake events to driving layer;
Step s708, subsequent ccf layer notifies application layer by way of message;
Step s710, application layer has adjusted speech recognition flow process;
Step s712, terminates.
It is to liberate user's both hands completely that this low-power consumption wakes up chip, makes Voice command flow process become close loop maneuver and is possibly realized.In view of low-power consumption wakes up chip belongs to hardware configuration, cannot configure in some projector types, so the system is in low configurationThis module of cutting is supported on projector, user can by other means, such as ancillary equipment, projector button is waking up.
Fig. 8 is the working state figure of voice control projection instrument system according to embodiments of the present invention, illustrates with reference to Fig. 8:
After equipment initialization completes and is waken up, equipment enters recording state, waits user input phonetic order.UserNow there are two kinds of possible operations: one is not have sounding, and identification process time-out terminates;One is to have sounding to be projected instrument typing,Hence into follow-up identification state.After entering identification state, if recognizing user to have said correct instruction, Jiu HuifenIt is dealt into corresponding normal stream assembly to be processed;If unrecognizable instruction, suggest that user input mistake, againInput or exit.
Wherein recording interrupts is a kind of specific identification mode under sound in stationary background.As Voice command during video playback.Lasting open detection user speech of now recording inputs and is directed to stationary background sound de-noising.If detecting dynamic with presetInstruct consistent phonetic entry, engine can return recognition result and inform that standard package stream carries out corresponding operating.Continue inspection simultaneouslySurvey phonetic entry next time, recording interrupts and will not stop exiting video playback until user.
In the embodiment of the present invention, loaded down with trivial details for projector apparatus manual operation, poor user experience, lack interesting problem,Voice control projection instrument system is proposed to solve this problem.This system can be waken up by sound with the use of family by hardware and softwareProjector simultaneously sends sound instruction.Whole flow process enables close loop maneuver, and that is, whole link is all completed by acoustic control, is not required toWanting manual operation, thus having liberated the both hands of user, greatly strengthen service efficiency and the interest of projector.This systemSupport cutting, can clipping function and hardware configuration as needed.
It should be noted that above-mentioned modules can be by software or hardware to realize, for the latter, Ke YitongCross in the following manner to realize, but not limited to this: above-mentioned module is respectively positioned in same processor;Or, above-mentioned module position respectivelyIn multiple processors.
Embodiments of the invention additionally provide a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium canTo be arranged to store for executing the program code of following steps:
S1, determines that projector apparatus enter speech recognition state, wherein, this speech recognition state is to hold according to phonetic orderThe state of row operation;
S2, the phonetic order of receives input;
S3, executes operation corresponding with above-mentioned phonetic order according to the phonetic order receiving.
Alternatively, in the present embodiment, above-mentioned storage medium can include but is not limited to: u disk, read only memory(read-only memory, referred to as rom), random access memory (random access memory, referred to asFor ram), portable hard drive, magnetic disc or CD etc. are various can be with the medium of store program codes.
Alternatively, in the present embodiment, processor is according to the program-code execution step s1-s3 of storage in storage medium.
Alternatively, the specific example in the present embodiment may be referred to showing described in above-described embodiment and optional embodimentExample, the present embodiment will not be described here.
Obviously, those skilled in the art should be understood that each module of the above-mentioned present invention or each step can be with generalRealizing, they can concentrate on single computing device computing device, or be distributed in multiple computing devices and formedNetwork on, alternatively, they can be realized with the executable program code of computing device, it is thus possible to by theyStorage to be executed by computing device in the storage device, and in some cases, can be to hold different from order hereinThe shown or described step of row, or they are fabricated to respectively each integrated circuit modules, or will be many in themIndividual module or step are fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific hardwareCombine with software.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for the technology of this areaFor personnel, the present invention can have various modifications and variations.All within the spirit and principles in the present invention, made anyModification, equivalent, improvement etc., should be included within the scope of the present invention.