CN112700782A

Movatterモバイル変換

Info

Publication number: CN112700782A
Application number: CN202011568957.8A
Authority: CN
Inventors: 李俊潓
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-23

Abstract

本申请公开了一种语音处理方法及电子设备，属于电子技术领域，以解决在现有技术中，语音唤醒算法的误启动现象发生的频率较高的问题。其中，所述语音处理方法包括：在目标声音信号大于第一预设阈值的情况下，获取所述目标声音信号的声纹信息；在所述声纹信息与预设的声纹信息匹配的情况下，启动语音唤醒功能。本申请中的语音处理方法应用于电子设备中。

The present application discloses a voice processing method and an electronic device, belonging to the field of electronic technology, to solve the problem of the high frequency of false activation of the voice wake-up algorithm in the prior art. Wherein, the voice processing method includes: when the target voice signal is greater than a first preset threshold, acquiring voiceprint information of the target voice signal; when the voiceprint information matches preset voiceprint information to activate the voice wake-up function. The speech processing method in this application is applied in electronic equipment.

Description

Voice processing method and electronic equipment

Technical Field

The application belongs to the technical field of electronics, and particularly relates to a voice processing method and electronic equipment.

Background

Currently, electronic devices have a voice wake-up function. In the process of realizing the voice awakening function, the microphone data needs to be monitored in real time to detect whether a person speaks. And when the fact that a person speaks is detected, starting a voice awakening function for verification. Furthermore, after the voice wake-up function verification is passed, the system of the device can be woken up to receive the command of the trainer and perform corresponding response. Therefore, the user can control the electronic equipment through voice, and the electronic equipment is convenient and quick.

Generally, a person is considered to be speaking as long as the speaking voice of the person is loud, but the person is not necessarily the user speaking. However, in such a case, the voice wakeup function is still activated, and thus a false activation phenomenon of the voice wakeup function occurs.

In the process of implementing the present application, the inventor finds that at least the following problems exist in the prior art: in the prior art, the false start of the voice wake-up function occurs at a high frequency.

Disclosure of Invention

An object of the embodiments of the present application is to provide a voice processing method, which can solve the problem in the prior art that the frequency of the false start of the voice wakeup function is high.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a speech processing method, where the method includes: acquiring voiceprint information of a target sound signal under the condition that the target sound signal is larger than a first preset threshold value; and starting a voice awakening function under the condition that the voiceprint information is matched with the preset voiceprint information.

In a second aspect, an embodiment of the present application provides a speech processing apparatus, including: the first acquisition module is used for acquiring voiceprint information of a target sound signal under the condition that the target sound signal is larger than a first preset threshold value; and the first starting module is used for starting a voice awakening function under the condition that the voiceprint information is matched with the preset voiceprint information.

In a third aspect, embodiments of the present application provide an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, where the program or instructions, when executed by the processor, implement the steps of the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a readable storage medium on which a program or instructions are stored, which when executed by a processor, implement the steps of the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.

In this way, in the embodiment of the present application, in the case that the target sound signal is greater than the first preset threshold, it is considered that a person is speaking, so as to acquire the voiceprint information of the first sound information, so as to match the acquired voiceprint information with the preset voiceprint information. If the matching is successful, the default is to specify the person speaking, so as to start the voice awakening function and carry out awakening related verification. It can be seen that, compared with the prior art, the voice print verification is added in the embodiment, so that the voice wake-up function can be started only under the condition of the command sent by the designated person (such as a user), and the false start frequency of the voice wake-up function is reduced.

Drawings

FIG. 1 is a flow chart of a speech processing method of an embodiment of the present application;

FIG. 2 is a block diagram of a speech processing apparatus according to an embodiment of the present application;

fig. 3 is a hardware configuration diagram of an electronic device according to an embodiment of the present application.

Fig. 4 is a second schematic diagram of a hardware structure of the electronic device according to the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The following describes the speech processing method provided by the embodiment of the present application in detail through a specific embodiment and an application scenario thereof with reference to the accompanying drawings.

FIG. 1 shows a flow diagram of a speech processing method according to an embodiment of the present application.

Step S1: and under the condition that the target sound signal is larger than a first preset threshold value, acquiring voiceprint information of the target sound signal.

In the prior art, in the process of implementing the Voice wake-up function, a Voice endpoint Detection (VAD) algorithm module is added to the front end. The module is used for: the microphone data is detected in real time, generally, the energy characteristics of input signals, frequency spectrum characteristics and other information are comprehensively analyzed to judge whether voice exists or not, or the average energy of environmental sound is directly detected, when the average energy exceeds a preset threshold value, a current person is determined to speak, and therefore whether a user speaks or not and the starting time point of the speaking of the user are detected. And secondly, a voice awakening algorithm module is arranged at the back end and used for realizing the voice awakening function. Further, the VAD algorithm module can inform the back-end of the voice wake-up algorithm, when to start running, and the starting point of the voice.

In the present embodiment, in combination with the VAD algorithm in the prior art, it is determined that the person is speaking currently when the target sound signal is greater than the first preset threshold.

Wherein the target sound signal comprises a human sound signal.

Optionally, the first preset threshold is a user-defined threshold or a system-defined threshold, so as to distinguish whether a person is speaking currently.

Further, in the present embodiment, the corresponding voiceprint check algorithm is simultaneously run in the VAD algorithm.

The voiceprint verification algorithm is a light algorithm to avoid too large power consumption, and meanwhile, the complexity of the algorithm supports dynamic adjustment to ensure verification capability.

In reference, the VAD algorithm is run at the front end all the time, and when the voice wake-up trigger in the prior art is passed, that is, the current person is determined to be speaking, the voiceprint information of the target voice signal is obtained, so as to perform additional voiceprint verification in the present embodiment.

Step S2: and starting the voice awakening function under the condition that the voiceprint information is matched with the preset voiceprint information.

In this step, when the additional voiceprint check is also passed in this embodiment, the VAD algorithm wakes up the back-end voice wake-up algorithm, and sends the voice data to the algorithm for voice wake-up check.

Wherein, the additional voiceprint check in this embodiment is: and matching and comparing the acquired voiceprint information of the target sound signal with preset voiceprint information.

In reference, similarity calculation is performed on the voiceprint information of the target sound signal and preset voiceprint information, and if the similarity of the two voiceprint information is greater than a preset threshold, the two voiceprint information and the preset voiceprint information are successfully matched.

Optionally, the user performs voice wakeup model training, and extracts voiceprint information of the trainer as preset voiceprint information.

Further, preset voiceprint information and matching conditions are loaded into the VAD algorithm, so that the corresponding voiceprint check algorithm is simultaneously operated in the VAD algorithm.

In another step, if the additional voiceprint check is not passed in this embodiment, the back-end wake-up voice algorithm is not started, but the loop detection is kept all the time.

In addition, the voice wake-up function generally runs in the low-power consumption processing chip to achieve the purpose of reducing power consumption, and after the voice wake-up function in the low-power consumption processing chip passes verification, a system of the device can be woken up to receive a command of a wakener to perform corresponding response. Since the complexity of the voice wakeup algorithm is many times greater than that of the VAD algorithm, the resource consumption and power consumption of the corresponding algorithm during operation are many times greater. Therefore, based on the voice processing method provided by this embodiment, on the basis of reducing the false start phenomenon of the voice wakeup function, the low power consumption processor can be prevented from frequently running the voice wakeup algorithm, so as to avoid excessive power consumption, achieve the purpose of saving power, and further optimize the performance of the device.

In the flow of the speech processing method according to another embodiment of the present application, step S2 includes any one of:

substep A1: and starting the voice awakening function under the condition that the voiceprint information is matched with the voiceprint information of the preset user.

In this step, the user can enter the voiceprint information of the user as the voiceprint information of the preset user.

Wherein the user customizes the correlation threshold. And when the similarity between the voiceprint information and the voiceprint information of the preset user is greater than a preset threshold value, the matching is successful.

The larger the value of the relevant threshold value is, the higher the VAD checking accuracy is, and the lower the frequency of the false start phenomenon of the voice awakening function is.

Substep A2: and starting the voice awakening function under the condition that the voiceprint information is matched with the preset type of voiceprint information.

In this step, the user can enter the voiceprint information of the user, and the type to which the voiceprint information of the user belongs is taken as the preset type of voiceprint information.

Alternatively, the vocal print information of the principal is classified into a male vocal print type or a female vocal print type based on the vocal print information of the principal.

Optionally, the voiceprint information of oneself is classified into a voiceprint type of a certain age group, such as any one of children, adolescents, adults, the elderly, and the like.

Alternatively, based on different classification criteria, one may combine with another to define the preset type. For example, the predetermined type is a combination of a male and an elderly person.

And the user defines the preset type.

The more classification standards included in the preset types, the finer the classification, the higher the precision of VAD verification, and the lower the frequency of the false start phenomenon of the voice wake-up function.

In this embodiment, two referenceable voiceprint verification schemes are provided. In different methods, different voiceprint information is preset for matching. On one hand, whether the voice print is the voice print of a registrant (such as the user) can be directly checked; on the other hand, it is possible to check whether it is of a type common to registrars (e.g., users themselves). Meanwhile, the embodiment supports the user to dynamically set any scheme and setting parameters in different schemes, further increases the accuracy of the VAD, and prevents the phenomena of excessive power consumption and false start caused by excessive triggering and permission of the voice wake-up function at the rear end.

In a speech processing method according to another embodiment of the present application, a voiceprint library is established locally. After the voice data of the user is dynamically updated, the voiceprint library is dynamically adapted according to the change of the voiceprint information of the user, so that the voiceprint information preset in the VAD algorithm is also updated and adapted, the voiceprint information is changed along with the change of the voice of the user, and the VAD checking accuracy is improved.

In a flow of the speech processing method according to another embodiment of the present application, in a case that the target sound signal is greater than the first preset threshold, the method further includes:

step B1: phoneme information of a target sound signal is acquired.

In this embodiment, the user may set a wake-up word. Each wake word sound has certain characteristics of its own.

For example, the wake words are "small V" and "Hi jovi", respectively. Different awakening words have different pronunciations and phonemes, and the oscillograms and spectrogram which visually represent sounds have different differences. More specifically, "small V and small V" are used as the wake-up word, the first syllable of the front pronunciation is "x", "Hi jovi" is used as the wake-up word, and the first syllable of the front pronunciation is "h", "x" and "h", and the speech characteristics are different.

Therefore, in the present embodiment, the phoneme information of the target sound signal may be acquired for matching with the preset phoneme information.

In this embodiment, the VAD algorithm is run simultaneously with the corresponding pronunciation check algorithm.

The pronunciation verification algorithm is a light algorithm to avoid too large power consumption, and meanwhile, the complexity of the algorithm supports dynamic adjustment to ensure the verification capability of the algorithm.

In reference, the VAD algorithm is always run at the front end, and when the voice wake-up trigger in the prior art is passed, that is, it is determined that a person is speaking currently, the phoneme information of the target sound signal is obtained, so as to perform the additional pronunciation verification in this embodiment.

Step B2: and starting the voice awakening function under the condition that the phoneme information is matched with the preset phoneme information.

In this step, when the additional pronunciation check also passes in this embodiment, the VAD algorithm wakes up the voice wake-up function at the back end, and sends the voice data to the function for voice wake-up check.

In this embodiment, the additional pronunciation verification is as follows: and matching and comparing the acquired phoneme information of the target sound signal with preset phoneme information.

In reference, if the phoneme information of the target sound signal includes preset phoneme information, the matching is successful.

Optionally, when the user sets the wakeup word, extracting phoneme information of the wakeup word, and dynamically generating an acoustic model as the preset phoneme information.

Further, preset phoneme information and matching conditions are loaded into the VAD algorithm, so that the corresponding pronunciation check algorithm is simultaneously run in the VAD algorithm.

In this embodiment, when the VAD is running, if a person is detected to speak, it is further detected whether the spoken utterance is a word matching the preset phoneme information. If the matching is successful, the awakening word is called, so that the voice awakening verification function of the rear end is started, otherwise, the detection is carried out all the time, and the rear-end voice awakening function is kept dormant. It can be seen that, compared with the prior art, pronunciation verification is added in the embodiment, so that the voice awakening function can be started only under the condition that the specified phrase is detected, and the frequency of mistakenly starting the voice awakening function by the non-awakening word is reduced.

In the flow of the speech processing method according to another embodiment of the present application, step B2 includes:

substep C1: and acquiring a preset precision degree.

Substep C2: and starting the voice awakening function under the condition that the phoneme information is matched with the first N phonemes in the preset voice content.

Wherein, N is a positive integer, and the value of N is in direct proportion to the preset precision degree.

In the present embodiment, the preset phoneme information includes the first N phonemes in the preset speech content.

The preset voice content is the awakening word.

In this embodiment, the user can customize the accuracy of the VAD check.

The higher the accuracy is, the more the number of phonemes included in the preset phoneme information is, and the larger the value of N is.

Optionally, in order to ensure the verification accuracy of the VAD, the phonemes at the front end of the preset speech content are sequentially added to the preset phoneme information.

For example, setting the wakeup word to be "small V and small V", may use "x" at the front end as the preset phoneme information; further, the value from "x" to "xi" may be changed as preset phoneme information; further, it may be changed from "xi" to "xia" as preset phoneme information; etc., one phoneme is increased, thereby changing the verification accuracy of the VAD.

Optionally, after the user sets the wakeup word and the precision degree, the N phonemes at the front end corresponding to the wakeup word may be extracted according to the phonemes included in the wakeup word and the preset precision degree, and the N phonemes are used as preset phoneme information for matching.

Correspondingly, preset wake words and accuracy are loaded into the VAD algorithm, so that the corresponding pronunciation check algorithm is simultaneously run in the VAD algorithm.

In this embodiment, the features of acoustic pronunciation are utilized to extract N pronunciations/phonemes at the front end of the wakeup word, and the N pronunciations/phonemes are checked in real time in the VAD algorithm to detect whether the user speaks a sentence related to the wakeup word, so as to reduce the start frequency of the back-end voice wakeup function and reduce the voice wakeup power consumption.

In the speech processing method according to another embodiment of the present application, a phoneme verification algorithm and a voiceprint verification algorithm may be combined to achieve more accurate speech processing, reduce false start frequency of a back-end speech wake-up function, and reduce speech wake-up power consumption.

step D1: and detecting whether the target sound signal has secondary distortion or not.

Step D2: and starting the voice awakening function under the condition that the target sound signal is not secondarily distorted.

In this embodiment, for the sound played by the devices such as the television and the mobile phone and the sound actually spoken by the human, the former has secondary distortion due to being played from the speaker and is different from the real human, and a corresponding distortion checking algorithm can be added to the VAD to distinguish the human and the real human played by the devices, thereby preventing the voice awakening function from being falsely triggered by the recorded human and preventing the attack of the counterfeit recorded sound.

In the flow of the speech processing method according to another embodiment of the present application, before step S1, the method further includes:

step E1: and updating the first preset threshold value to a third preset threshold value under the condition that the environmental sound signal is greater than the second preset threshold value.

And the third preset threshold is greater than the first preset threshold.

Optionally, the second preset threshold is customized by a user, or customized by a system, so as to distinguish a scene with large noise, or a scene with weak performance of a front-end denoising algorithm.

Optionally, the third preset threshold is user-defined, or system-defined. The third preset threshold is larger than the first preset threshold, so as to improve the detection threshold of the VAD algorithm.

In this embodiment, in a noisy environment where the energy of the audio data is large or there are many voices, the first preset threshold to the third preset threshold may be adaptively adjusted, so as to adjust the detection threshold of the VAD, avoid excessive triggering of the operation of the back-end voice wake-up algorithm, reduce the false start frequency of the back-end voice wake-up function, and reduce the power consumption of the voice wake-up.

step F1: within a preset time period, detecting whether the target sound signal is greater than a first preset threshold.

In this embodiment, the preset time period may be a time period other than the first time period, and the first time period may be when the user sleeps or when the voice wake-up function is not used.

The preset time period can be customized by a user and can also be identified by an intelligent scene of the system.

In this embodiment, when the user sleeps or the voice wake-up function is not used, the front-end VAD algorithm or voice wake-up algorithm may be stopped, so as to reduce power consumption and false start frequency.

In summary, the embodiment can well distinguish whether the registrant is awakened by voice to speak or not and whether the registrant speaks the voice related to the awakening word or not, so that the frequency of frequently pulling up the back-end voice awakening algorithm to run is reduced. Meanwhile, the processing amount of the front end is increased, so that the voice data sent to the voice awakening algorithm of the rear end is reduced, the complexity of the voice awakening algorithm of the rear end is reduced, the accuracy of the voice awakening algorithm is improved, the power consumption is reduced, and the phenomenon of mistaken awakening of the system is reduced. In addition, the embodiment provides accurate calibration algorithms such as voiceprint and phoneme, which can improve the accuracy of judging the starting point of the voice awakening word, thereby reducing the complexity of the back-end voice awakening algorithm, improving the accuracy of the voice awakening algorithm, reducing the power consumption and reducing the false awakening phenomenon of the system.

It should be noted that, in the voice processing method provided in the embodiment of the present application, the execution main body may be a voice processing apparatus, or a control module in the voice processing apparatus for executing the voice processing method. In the embodiment of the present application, a speech processing apparatus executing a speech processing method is taken as an example, and a speech processing apparatus of the speech processing method provided in the embodiment of the present application is described.

Fig. 2 shows a block diagram of a speech processing apparatus according to another embodiment of the present application, including:

the first acquisition module is used for acquiring voiceprint information of the target sound signal under the condition that the target sound signal is larger than a first preset threshold value;

and the first starting module is used for starting the voice awakening function under the condition that the voiceprint information is matched with the preset voiceprint information.

Optionally, the first activation module comprises any one of:

the first matching unit is used for starting a voice awakening function under the condition that the voiceprint information is matched with the voiceprint information of the preset user;

and the second matching unit is used for starting the voice awakening function under the condition that the voiceprint information is matched with the preset type of voiceprint information.

Optionally, the apparatus further comprises:

the second acquisition module is used for acquiring the phoneme information of the target sound signal;

and the second starting module is used for starting the voice awakening function under the condition that the phoneme information is matched with the preset phoneme information.

Optionally, the second starting module includes:

the third acquisition unit is used for acquiring a preset precision degree;

the third matching unit is used for starting a voice awakening function under the condition that the phoneme information is matched with the first N phonemes in the preset voice content; n is a positive integer, and the value of N is in direct proportion to the preset precision degree.

Optionally, the apparatus further comprises:

the updating module is used for updating the first preset threshold value to a third preset threshold value under the condition that the environmental sound signal is larger than the second preset threshold value; the third preset threshold is greater than the first preset threshold.

The voice processing device in the embodiment of the present application may be a device, and may also be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.

The speech processing apparatus in the embodiment of the present application may be an apparatus having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.

The speech processing apparatus provided in the embodiment of the present application can implement each process implemented by the foregoing method embodiment, and is not described here again to avoid repetition.

Optionally, as shown in fig. 3, anelectronic device 100 is further provided in this embodiment of the present application, and includes aprocessor 101, amemory 102, and a program or an instruction stored in thememory 102 and executable on theprocessor 101, where the program or the instruction is executed by theprocessor 101 to implement each process of any one of the foregoing embodiments of the speech processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.

Fig. 4 is a schematic diagram of a hardware structure of an electronic device implementing an embodiment of the present application.

Theelectronic device 1000 includes, but is not limited to: aradio frequency unit 1001, anetwork module 1002, anaudio output unit 1003, aninput unit 1004, asensor 1005, adisplay unit 1006, a user input unit 1007, aninterface unit 1008, amemory 1009, aprocessor 1010, and the like.

Those skilled in the art will appreciate that theelectronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to theprocessor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The electronic device structure shown in fig. 4 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.

Theprocessor 1010 is configured to acquire voiceprint information of a target sound signal when the target sound signal is greater than a first preset threshold; and starting a voice awakening function under the condition that the voiceprint information is matched with the preset voiceprint information.

Optionally, theprocessor 1010 is further configured to start a voice wakeup function when the voiceprint information matches voiceprint information of a preset user; and starting a voice awakening function under the condition that the voiceprint information is matched with the preset type of voiceprint information.

Optionally, theprocessor 1010 is further configured to obtain phoneme information of the target sound signal; and starting a voice awakening function under the condition that the phoneme information is matched with preset phoneme information.

Optionally, theprocessor 1010 is further configured to obtain a preset accuracy degree; starting a voice awakening function under the condition that the phoneme information is matched with the first N phonemes in the preset voice content; n is a positive integer, and the value of N is in direct proportion to the preset precision degree.

Optionally, theprocessor 1010 is further configured to update the first preset threshold to a third preset threshold when the ambient sound signal is greater than a second preset threshold; the third preset threshold is greater than the first preset threshold.

It should be understood that in the embodiment of the present application, theinput Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and amicrophone 10042, and theGraphics Processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. Thedisplay unit 1006 may include adisplay panel 10061, and thedisplay panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes atouch panel 10071 and other input devices 10072. Thetouch panel 10071 is also referred to as a touch screen. Thetouch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. Thememory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems.Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated intoprocessor 1010.

The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of any one of the foregoing embodiments of the speech processing method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of any one of the foregoing voice processing method embodiments, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of speech processing, the method comprising:

acquiring voiceprint information of a target sound signal under the condition that the target sound signal is larger than a first preset threshold value;

and starting a voice awakening function under the condition that the voiceprint information is matched with the preset voiceprint information.

2. The method according to claim 1, wherein, in the case that the voiceprint information matches preset voiceprint information, starting a voice wake-up function includes any one of:

starting a voice awakening function under the condition that the voiceprint information is matched with the voiceprint information of a preset user;

and starting a voice awakening function under the condition that the voiceprint information is matched with the preset type of voiceprint information.

3. The method according to claim 1, wherein in case that the target sound signal is greater than the first preset threshold, further comprising:

acquiring phoneme information of the target sound signal;

and starting a voice awakening function under the condition that the phoneme information is matched with preset phoneme information.

4. The method according to claim 3, wherein the starting of the voice wakeup function in the case that the phoneme information matches the preset phoneme information comprises:

acquiring a preset precision degree;

starting a voice awakening function under the condition that the phoneme information is matched with the first N phonemes in the preset voice content; n is a positive integer, and the value of N is in direct proportion to the preset precision degree.

5. The method according to claim 1, wherein before acquiring the voiceprint information of the target sound signal if the target sound signal is greater than the first preset threshold, the method further comprises:

updating the first preset threshold value to a third preset threshold value when the ambient sound signal is larger than a second preset threshold value; the third preset threshold is greater than the first preset threshold.

6. A speech processing apparatus, characterized in that the apparatus comprises:

the first acquisition module is used for acquiring voiceprint information of a target sound signal under the condition that the target sound signal is larger than a first preset threshold value;

and the first starting module is used for starting a voice awakening function under the condition that the voiceprint information is matched with the preset voiceprint information.

7. The apparatus of claim 6, wherein the first initiating module comprises any one of:

the first matching unit is used for starting a voice awakening function under the condition that the voiceprint information is matched with the voiceprint information of a preset user;

and the second matching unit is used for starting a voice awakening function under the condition that the voiceprint information is matched with the preset type of voiceprint information.

8. The apparatus of claim 6, further comprising:

9. The apparatus of claim 8, wherein the second enabling module comprises:

the third acquisition unit is used for acquiring a preset precision degree;

10. The apparatus of claim 6, further comprising:

the updating module is used for updating the first preset threshold value to a third preset threshold value under the condition that the environmental sound signal is larger than a second preset threshold value; the third preset threshold is greater than the first preset threshold.