CN113241077A

Movatterモバイル変換

Info

Publication number: CN113241077A
Application number: CN202110650959.XA
Authority: CN
Inventors: 邵雅婷; 周强
Original assignee: Sipic Technology Co Ltd
Current assignee: Sipic Technology Co Ltd
Priority date: 2021-06-09
Filing date: 2021-06-09
Publication date: 2021-08-10

Abstract

Translated fromChinese

本发明公开用于可穿戴设备的语音录入方法和装置，其中，一种用于可穿戴设备的语音录入方法，包括：响应于获取到第一语音信号，判断与所述第一语音信号对应的第一声纹信息是否与所述可穿戴设备中预设的第二声纹信息一致；若一致，将所述第一语音信号增强并存储；基于存储的第一语音信号，将所述第一语音信号转换为文字信息。通过将与可穿戴设备中预设的声纹信息一致的语音信息存储并转换为文字信息，从而可以实现更加便捷和准确以及能够随时随地的语音录入。

The present invention discloses a voice input method and device for wearable devices, wherein a voice input method for wearable devices includes: in response to acquiring a first voice signal, determining a voice input corresponding to the first voice signal. Whether the first voiceprint information is consistent with the second voiceprint information preset in the wearable device; if consistent, the first voice signal is enhanced and stored; based on the stored first voice signal, the first voice The voice signal is converted into text information. By storing and converting voice information consistent with the voiceprint information preset in the wearable device into text information, more convenient and accurate voice input can be achieved anytime and anywhere.

Description

Voice entry method and device for wearable device

Technical Field

The invention belongs to the technical field of voice input, and particularly relates to a voice input method and device for wearable equipment.

Background

The sound contains various rich information, such as the most intuitive voice content, the voice content of the service personnel is analyzed, and the service attitude and the service quality of the service personnel can be evaluated; the breathing sound and the cough sound of the patient are monitored, and whether the state of an illness of the patient is urgent or not and whether first aid is needed or not can be judged; even the night snoring duration of the patient can be monitored for judging whether medical intervention is needed or not, and the like. The voice input can also replace the traditional handwriting and typing modes, compared with the traditional recording mode, the voice input method is fast, convenient and friendly, and the required information can be recorded at any time and any place only by moving the mouth and matching with an off-line or on-line recognition transfer technology; at present, devices capable of recording sound in the market are diversified, and comprise a traditional recording pen capable of recording sound only, a recorder on a mobile phone, an intelligent recording pen which is popular recently and the like.

Wherein, the recording pen includes microphone collection module, speaker broadcast module, and along with speech technology's development, some AI intelligence recording pens still include the AI and fall the module of making an uproar to and pronunciation transcription module, made things convenient for the record work of each scene such as daily work, study.

However, the devices are inconvenient to carry, and the devices need to be held by hands or placed on a desk or a workbench when being used for voice recording, and the devices cannot be exactly beside the user when the user wants to use the devices; there are also various environments for voice input, and there is a possibility that the environment may be easily disturbed by environmental noise or other human voices in a service hall, a restaurant, a factory, a hospital, or the like.

Disclosure of Invention

The embodiment of the invention provides a voice recording method and device for wearable equipment, which are used for solving at least one of the technical problems.

In a first aspect, an embodiment of the present invention provides a voice entry method for a wearable device, including: responding to the acquired first voice signal, and judging whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device or not; if the first voice signal is consistent with the second voice signal, enhancing and storing the first voice signal; and converting the first voice signal into text information based on the stored first voice signal.

In a second aspect, an embodiment of the present invention provides a speech recording apparatus for a wearable device, including: the acquisition judging program module is configured to respond to acquisition of a first voice signal and judge whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device; the enhanced storage program module is configured to enhance and store the first voice signal if the first voice signal is consistent with the first voice signal; and the conversion program module is configured to convert the first voice signal into text information based on the stored first voice signal.

In a third aspect, an electronic device is provided, comprising: the wearable device comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to perform the steps of the voice entry method for the wearable device of any embodiment of the invention.

In a fourth aspect, embodiments of the present invention also provide a computer program product including a computer program stored on a non-volatile computer-readable storage medium, the computer program including program instructions which, when executed by a computer, cause the computer to perform the steps of the voice entry method for a wearable device of any of the embodiments of the present invention.

According to the method and the device, the voice information consistent with the preset voiceprint information in the wearable device is stored and converted into the text information, so that the voice input can be more convenient, more convenient and more accurate, and can be carried out anytime and anywhere.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of a voice recording method for a wearable device according to an embodiment of the present invention;

fig. 2 is a flowchart of another speech input method for a wearable device according to an embodiment of the present invention;

fig. 3 is a flowchart of another voice recording method for a wearable device according to an embodiment of the present invention;

fig. 4 is a flowchart of monitoring the attitudes of service personnel according to a specific example of a voice recording method for a wearable device provided by an embodiment of the present invention;

fig. 5 is a block diagram of a voice recording apparatus for a wearable device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, a flowchart of an embodiment of a voice entry method for a wearable device according to the present application is shown.

As shown in fig. 1, instep 101, in response to acquiring a first voice signal, determining whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device;

instep 102, if the first voice signal is consistent with the first voice signal, the first voice signal is enhanced and stored;

instep 103, the first voice signal is converted into text information based on the stored first voice signal.

In this embodiment, forstep 101, in response to acquiring the first voice signal, the voice entry device for the wearable device determines whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device, for example, after the wearable device acquires the voice signal, based on matching between the voiceprint information in the acquired voice signal and the preset voiceprint information, determine whether the acquired voice signal is a voice signal sent by a user.

Then, instep 102, if the first voiceprint information corresponding to the first voice signal is consistent with the second voiceprint information preset in the wearable device, the first voice signal is enhanced and stored.

Finally, forstep 103, the voice recording device for the wearable device converts the first voice signal into text information based on the stored first voice signal, for example, the user wearing the wearable device is a service person, the voice of the user can be recorded and converted into text information at any time, the service attitude of the user and the problems of the service person are recorded, for example, the user wearing the wearable device is a patient, and the voice records of the user, such as cough and snoring, can be used for generating text cases, so that the medical care personnel can conveniently judge the illness state.

According to the method, the voice information consistent with the preset voiceprint information in the wearable device is stored and converted into the text information, so that the voice input can be more convenient, more accurate, and anytime and anywhere.

In the method according to the foregoing embodiment, the determining whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device further includes:

and if the first voiceprint information corresponding to the first voice signal is inconsistent with the second voiceprint information preset in the wearable device, shielding and eliminating the first voice signal.

According to the method, the first voice signal inconsistent with the second voiceprint information preset in the wearable device is shielded and eliminated, so that external interference and sounds of non-wearers can be shielded.

Please refer to fig. 2, which shows a flowchart of another embodiment of the voice entry method for a wearable device according to the present application, and the flowchart mainly includes steps further defined after a flow of "responding to acquiring a first voice signal and determining whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device" in the flowchart 1.

As shown in fig. 2, instep 201, in response to acquiring a second voice signal, determining whether third voiceprint information corresponding to the second voice signal is consistent with fourth voiceprint information preset in the wearable device;

instep 202, if they are consistent, the first voice signal and the second voice signal are classified, stored and converted into words based on the preset second voiceprint information and the preset fourth voiceprint information.

In this embodiment, forstep 201, in response to acquiring the second voice signal, the voice entry device for the wearable device determines whether the third voiceprint information corresponding to the second voice signal is consistent with fourth voiceprint information preset in the wearable device, for example, the second voiceprint information preset in the wearable device is a patient wearing the wearable device, the fourth voiceprint information preset is medical staff, and after acquiring the second voice signal, determines whether the third voiceprint information of the second voice signal is consistent with the fourth voiceprint information preset by the medical staff.

Then, forstep 202, if the third voiceprint information corresponding to the second voice signal is consistent with the preset fourth voiceprint information in the wearable device, the first voice signal and the second voice signal are stored and converted into characters in a classified manner based on the preset second voiceprint information and the preset fourth voiceprint information, for example, when a medical worker inquires about the illness state of a patient, the inquiry of the medical worker and the answer of the patient are converted into characters in a classified manner, so that the medical worker can conveniently check and observe the disease state subsequently.

According to the method, the first voice signal consistent with the second voiceprint information preset in the wearable device and the second voice signal consistent with the fourth voiceprint information preset in the wearable device are stored in a classified mode and converted into characters, and therefore segmentation and clustering and voice recording of speakers can be achieved anytime and anywhere.

In the method according to the foregoing embodiment, after the converting the first speech signal into text information, the method further includes:

the method comprises the steps of obtaining voice evaluation information of a wearable user of the wearable device, and converting the voice evaluation information into evaluation text information.

According to the method, the voice evaluation information of the wearable user of the wearable device is obtained, so that the service quality of service personnel can be evaluated.

Referring to fig. 3, a flowchart of a further embodiment of the speech entry method for a wearable device of the present application is shown, and the flowchart mainly refers to a flowchart of steps further defined in the flow of "enhance and store the first speech signal" of fig. 1, wherein the wearable device collects the speech of the user through a microphone array.

As shown in fig. 3, instep 301, a voice beam corresponding to the first voice signal fed back by the microphone array is obtained, and it is determined whether the voice beam is from a speaking direction of a wearing user of the wearable device;

instep 302, if the speaking direction is from the wearing user of the wearable device, the first voice signal is enhanced and stored.

In this embodiment, forstep 301, the speech entry apparatus for the wearable device acquires a speech beam corresponding to the first speech signal fed back by the microphone array, and determines whether the speech beam is from the speaking direction of the wearable user of the wearable device, for example, the speech beam is from the speaking direction of the wearable user of the wearable device, so that it can be determined that the first speech signal is from the wearable user of the wearable device.

Then, forstep 302, if the speaking direction is from the wearing user of the wearable device, the first speech signal is enhanced and stored.

According to the method, the voice of the user is collected through the microphone array, whether the voice wave beam corresponding to the collected voice signal is from the direction of the wearing user is judged, so that accurate voice recording can be achieved, and errors of the voice recording are reduced.

In the method of the above embodiment, the microphone array can be replaced by a directional microphone, and the directional microphone can be pointed to the speaking direction of the wearing user of the wearable device and can shield voice signals in other directions.

According to the method, the microphone array is replaced by the directional microphone, so that the speaking direction of a wearing user can be directly pointed from hardware.

In the method of any of the above embodiments, the wearable device comprises: chest card, earphone, glasses, the hanging equipment of neck, bracelet and wrist-watch, wearable equipment can carry out environmental monitoring, unusual sound monitoring, health monitoring and scene recognition, for example, can carry out environmental monitoring, unusual sound monitoring, health detection and scene recognition etc. through intelligent speech algorithm.

It should be noted that the above method steps are not intended to limit the execution order of the steps, and in fact, some steps may be executed simultaneously or in the reverse order of the steps, which is not limited herein.

The following description is provided to enable those skilled in the art to better understand the present disclosure by describing some of the problems encountered by the inventors in implementing the present disclosure and by describing one particular embodiment of the finally identified solution.

The inventor finds that the defects in the prior art are mainly caused by the following reasons in the process of implementing the application: the existing equipment has the problem of inconvenient carrying, and the equipment needs to be held by hands or placed on a desk or a workbench when being recorded by voice, and the equipment cannot be rightly carried when being used; there are also various environments for voice input, and there is a possibility that the environment may be easily disturbed by environmental noise or other human voices in a service hall, a restaurant, a factory, a hospital, or the like. The problems caused by these drawbacks are long standing problems in the field.

The inventor also finds that at present, few practical means are used for monitoring and evaluating the service attitude and quality of service personnel in each industry, because all the service personnel cannot be stared at all times; in hospitals, monitoring means for abnormal sounds such as breathing, coughing and snoring of patients hardly exist; for scenes needing to be recorded, the handwriting mode is slow, time is wasted, and errors are easy to occur.

The mode of voice input adopted in the scenes is very quick, but if the equipment for voice input is still required to be held by hands, the equipment is very inconvenient, so that a wearable design is necessary; meanwhile, many environments are complex, and the quality of recorded voice is influenced by the sound of other people communicating, the sound of an air conditioner fan, the sound of a car driving rapidly on the outdoor road and the like.

The scheme of the application is mainly designed and optimized from the following aspects:

this patent design makes wearable equipment with sound input device, including but not limited to chest card, glasses, earphone, neck hanging device etc. carries out the processing of hardware or software to the sound of logging into, shields external disturbance and non-wearer's sound, obtains high-quality wearer's sound information. The recorded sound is subjected to transcription evaluation, environmental monitoring, health monitoring and the like, so that the purpose is achieved.

Referring to fig. 4, a flow chart of monitoring attendant attitudes of one specific example of a voice entry method for a wearable device of the present application is shown.

As shown in fig. 4, step 1: and the voice signal acquisition module is provided with a microphone for acquiring voice signals. Analog or digital microphones can be selected, and microphone arrays can also be adopted, so that subsequent voice signal processing algorithms can be conveniently used;

step 2: the voice signal processing module can adopt a voiceprint recognition algorithm to judge whether the input voice is voiceprint information of the wearer; or a microphone array signal processing algorithm can be adopted to design the direction of the wave beam pointing to the wearer, and the voice of the wearer is obtained according to the directivity;

and step 3: if the voice is judged not to be the voice of the wearer, shielding and eliminating the voice, and if the voice is judged to be the voice of the wearer, enhancing and reserving the voice;

and 4, step 4: and converting the voice signal into text information by using an online or offline transfer function, and carrying out subsequent service quality evaluation.

Beta version formed by the inventor in the process of implementing the invention:

the directional microphone can be selected in the voice signal acquisition module, the speaking direction of a wearer can be directly pointed from hardware, the sounds in other directions are shielded, and the directional microphone is simple and convenient, but has the defects of high cost and single directivity, and can randomly change the directivity through an algorithm unlike a microphone array.

The inventor finds that deeper effects are achieved in the process of implementing the invention:

the wearable design is more portable, hands are liberated, the mode is more friendly, and voice recording can be carried out anytime and anywhere;

intelligent voice algorithm-help to realize voice separation, transcription evaluation, environment monitoring, abnormal voice monitoring, health monitoring and the like.

Wherein, wearable design, hands free, including but not limited to chest cards, earphones, glasses, neck hanging devices, etc.;

the intelligent voice algorithm helps to realize sound separation, transcription evaluation, environment monitoring, abnormal sound monitoring, health monitoring and the like, and the measures comprise a directional microphone scheme, an array signal processing scheme, speaker segmentation and clustering, scene recognition and the like.

Referring to fig. 5, a block diagram of a voice recording apparatus for a wearable device according to an embodiment of the present invention is shown.

As shown in fig. 5, the speech recording apparatus 500 for a wearable device includes an acquisitiondetermination program module 510, an enhancedstorage program module 520, and aconversion program module 530.

The obtaining determiningprogram module 510 is configured to, in response to obtaining a first voice signal, determine whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device; an enhancedstorage program module 520 configured to enhance and store the first speech signal if they are consistent; aconversion program module 530 configured to convert the first voice signal into text information based on the stored first voice signal.

It should be understood that the modules recited in fig. 5 correspond to various steps in the methods described with reference to fig. 1, 2, and 3. Thus, the operations and features described above for the method and the corresponding technical effects are also applicable to the modules in fig. 5, and are not described again here.

It should be noted that the modules in the embodiments of the present disclosure are not limited to the aspects of the present disclosure, for example, the acquisition determining program module may be described as a module that determines, in response to acquiring a first voice signal, whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device. In addition, the related function module may also be implemented by a hardware processor, for example, the acquisition determining program module may also be implemented by a processor, which is not described herein again.

In other embodiments, an embodiment of the present invention further provides a non-volatile computer storage medium, where the computer storage medium stores computer-executable instructions, where the computer-executable instructions may execute the voice entry method for a wearable device in any of the above method embodiments;

as one embodiment, a non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

responding to the acquired first voice signal, and judging whether first voiceprint information corresponding to the first voice signal is consistent with second voiceprint information preset in the wearable device or not;

if the first voice signal is consistent with the second voice signal, enhancing and storing the first voice signal;

and converting the first voice signal into text information based on the stored first voice signal.

The non-volatile computer-readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of a voice entry device for a wearable apparatus, and the like. Further, the non-volatile computer-readable storage medium may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium optionally includes a memory remotely located from the processor, which may be connected over a network to a voice entry device for the wearable apparatus. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

Embodiments of the present invention also provide a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform any of the above-described voice entry methods for a wearable device.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device includes: one ormore processors 610 and amemory 620, with oneprocessor 610 being an example in fig. 6. The device for the voice entry method of the wearable device may further include: aninput device 630 and anoutput device 640. Theprocessor 610, thememory 620, theinput device 630, and theoutput device 640 may be connected by a bus or other means, such as the bus connection in fig. 6. Thememory 620 is a non-volatile computer-readable storage medium as described above. Theprocessor 610 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions and modules stored in thememory 620, namely, implementing the voice entry method for the wearable device of the above method embodiments. Theinput device 630 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the communication compensation device. Theoutput device 640 may include a display device such as a display screen.

The product can execute the method provided by the embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.

As an embodiment, the electronic device is applied to a voice recording apparatus for a wearable device, and is used for a client, and includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to:

The electronic device of the embodiments of the present application exists in various forms, including but not limited to:

(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.

(3) A portable entertainment device: such devices can display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.