Movatterモバイル変換


[0]ホーム

URL:


CN113963396A - Face recognition enhancement method and device, electronic equipment and storage medium - Google Patents

Face recognition enhancement method and device, electronic equipment and storage medium
Download PDF

Info

Publication number
CN113963396A
CN113963396ACN202010700877.7ACN202010700877ACN113963396ACN 113963396 ACN113963396 ACN 113963396ACN 202010700877 ACN202010700877 ACN 202010700877ACN 113963396 ACN113963396 ACN 113963396A
Authority
CN
China
Prior art keywords
recognition result
recognition
result
target object
audio data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010700877.7A
Other languages
Chinese (zh)
Inventor
陈坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Uniview Technologies Co Ltd
Original Assignee
Zhejiang Uniview Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Uniview Technologies Co LtdfiledCriticalZhejiang Uniview Technologies Co Ltd
Priority to CN202010700877.7ApriorityCriticalpatent/CN113963396A/en
Publication of CN113963396ApublicationCriticalpatent/CN113963396A/en
Withdrawnlegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The embodiment of the invention discloses a face recognition enhancement method, a face recognition enhancement device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a target image containing a target object, and performing face recognition on the target image to obtain a first recognition result; acquiring audio data of a target object acquired by an audio acquisition device, and performing voice recognition on the audio data to obtain a second recognition result; and based on the second recognition result, correcting or supplementing the first recognition result to obtain a final recognition result. In the embodiment of the invention, the face recognition result of the target object is corrected or supplemented by utilizing the voice recognition result of the target object, so that the accuracy of face recognition is improved.

Description

Face recognition enhancement method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of video monitoring, in particular to a face recognition enhancement method and device, electronic equipment and a storage medium.
Background
In recent years, video monitoring systems with intelligent face recognition functions are more and more widely applied, and products with different specifications can be used from large-scale safe cities to small-scale stores. When the product is used for face recognition, due to factors such as a shooting angle, scene light, a person in a motion state and the like, part of acquired images are low in quality, and therefore face recognition accuracy is low.
Disclosure of Invention
The embodiment of the invention provides a face recognition enhancement method, a face recognition enhancement device, electronic equipment and a storage medium, and aims to improve the face recognition accuracy.
In a first aspect, an embodiment of the present invention provides a face recognition enhancement method, where the method includes:
acquiring a target image containing a target object, and performing face recognition on the target image to obtain a first recognition result;
acquiring audio data of a target object acquired by an audio acquisition device, and performing voice recognition on the audio data to obtain a second recognition result;
and based on the second recognition result, correcting or supplementing the first recognition result to obtain a final recognition result.
In a second aspect, an embodiment of the present invention provides a face recognition enhancing apparatus, where the apparatus includes:
the face recognition module is used for acquiring a target image containing a target object and performing face recognition on the target image to obtain a first recognition result;
the voice recognition module is used for acquiring the audio data of the target object acquired by the audio acquisition device and performing voice recognition on the audio data to obtain a second recognition result;
and the result correction module is used for correcting or supplementing the first recognition result based on the second recognition result to obtain a final recognition result.
In a third aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
a storage device for storing one or more programs,
when executed by one or more processors, cause the one or more processors to implement a face recognition enhancement method according to any embodiment of the invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a face recognition enhancement method according to any embodiment of the present invention.
In the embodiment of the invention, after the face recognition is carried out on the target image containing the target object, the voice recognition is carried out on the collected audio data of the target object, and then the face recognition result of the target object is corrected or supplemented based on the voice recognition result of the target object, so that the aim of improving the accuracy of the face recognition is fulfilled.
Drawings
Fig. 1 is a schematic flow chart of a face recognition enhancement method according to a first embodiment of the present invention;
fig. 2 is a schematic flow chart of a face recognition enhancement method in the second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a face recognition enhancing apparatus according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device in a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a face recognition enhancement method according to an embodiment of the present invention, which is applicable to a situation that face recognition is required, for example, the method is applied to a video monitoring system with a face recognition function, and the method can be executed by a face recognition enhancement device, and the device can be implemented in a software and/or hardware manner, and can be integrated on an electronic device, for example, a camera with an audio collector, or a backend server.
As shown in fig. 1, the face recognition enhancement method specifically includes the following steps:
s101, obtaining a target image containing a target object, and performing face recognition on the target image to obtain a first recognition result.
In the embodiment of the present invention, the target image of the target object may be image data including one frame of image collected by the monitoring camera, or may be video data including a plurality of frames of images. When the face recognition is performed on the target image, optionally, a face area in the target image is detected first, the face image is extracted from the target image based on the face area, and then the face image is subjected to face recognition to obtain a first recognition result, wherein the first recognition result includes character structural information, such as gender, age group, race and emotion, whether glasses are worn or not, a beard and the like.
In an alternative embodiment, performing face recognition on a target image to obtain a first recognition result includes:
inputting the target image into a pre-trained face recognition model, and obtaining a first recognition result according to the output of the face recognition model, wherein the first recognition result comprises the recognition result of at least one biological feature and the confidence coefficient of the recognition result of each biological feature.
In the embodiment of the invention, the pre-trained face recognition model can be selected as a trained convolutional neural network model. And the biological characteristics in the first recognition result at least include the above-mentioned person structural information (sex, age group, race, emotion, whether glasses are worn, mustache, etc.). And the recognition result of the biometric characteristic is exemplified by: sex: male; age: 20-24 years old; race: people of yellow race. The confidence degree of the identification result of the biological characteristics is used for representing the credibility of the identification result, and if the confidence degree of a certain identification result is higher, the more accurate the prediction is.
It should be noted that, if the quality of the acquired target image including the target object is poor, for example, the resolution is low, before the target image is input into the pre-trained face recognition model, the target image may be preprocessed, for example, the target image is cut to obtain a cut image, and then the cut image is reconstructed by using the pre-trained image reconstruction model, so that face recognition may be performed based on the reconstructed target image, and thus the accuracy of face recognition is ensured.
S102, audio data of the target object collected by the audio collector are obtained, and voice recognition is carried out on the audio data to obtain a second recognition result.
The monitoring system is characterized in that at least two audio collectors are arranged on a camera of the monitoring system, the audio collectors can be selected as sound collectors, audio data of a target object can be obtained through the audio collectors, and then the collected audio data of the target object is identified through a voice identification technology to obtain a second identification result.
In an alternative embodiment, performing speech recognition on the audio data to obtain the second recognition result includes:
and inputting the audio data into a pre-trained voice recognition model, and obtaining a second recognition result according to the output of the voice recognition model, wherein the second recognition result comprises the recognition result of at least one biological characteristic and the confidence coefficient of the recognition result of each biological characteristic.
It should be noted that the speech recognition model is obtained by training based on audio data corresponding to different biological features as training samples, and for example, audio data of males or females of different ages may be collected as training samples.
And S103, correcting or supplementing the first recognition result based on the second recognition result to obtain a final recognition result.
In an alternative embodiment, modifying or supplementing the first recognition result based on the second recognition result to obtain a final recognition result, includes:
and comparing at least one biological characteristic and the confidence coefficient thereof in the second recognition result with at least one biological characteristic and the confidence coefficient thereof in the first recognition result, and correcting or supplementing the first recognition result according to the comparison result to obtain a final recognition result.
Optionally, for any biometric feature shared by the first recognition result and the second recognition result, if the confidence of the recognition result of the biometric feature in the second recognition result is greater than the confidence of the recognition result of the biometric feature in the first recognition result, the recognition result of the biometric feature in the first recognition result is replaced with the recognition result of the biometric feature in the second recognition result. For example, for the biological feature of "gender", in the first recognition result: gender male, confidence 60%; and in the second recognition result: gender female, confidence 85%; the second recognition result is considered to be more accurate, and the gender male in the first recognition result can be replaced by the gender female.
For any biometric feature that is present in the second recognition result and is not present in the first recognition result, the recognition result of the biometric feature is supplemented to the first recognition result. That is, the features that can only be obtained by speech recognition are supplemented to the face image recognition result. Illustratively, the biological feature is the province to which the target object belongs, and when voice recognition is performed on the audio data of the target object, the target object is determined to be northeast, belongs to the three eastern provinces, and the confidence coefficient is 80% according to the voice feature. Therefore, "the target object belongs to the east-third province" can be supplemented to the first recognition result, thereby enriching the result of the face recognition.
In the embodiment of the invention, after the face recognition is carried out on the target image containing the target object, the voice recognition is carried out on the collected audio data of the target object, and then the face recognition result of the target object is corrected or supplemented based on the voice recognition result of the target object, so that the aim of improving the accuracy of the face recognition is fulfilled.
Example two
Fig. 2 is a flowchart of a face recognition enhancement method according to a second embodiment of the present invention, where the present embodiment is optimized based on the foregoing embodiment, and adds an operation of acquiring audio data of a target object, as shown in fig. 2, the method includes:
s201, obtaining a target image containing a target object, and performing face recognition on the target image to obtain a first recognition result.
After the face recognition model is used for carrying out face recognition on the target image, the recognition result of at least one biological feature of the target object can be obtained, and the coordinate information (namely the position information) of the target object in the target image can also be obtained.
S202, acquiring position information of the target object in the target image and distance information between the target object and the camera.
After the position information of the target object in the target image is acquired from the first recognition result, the distance information between the target object and the camera can be calculated according to the following formula:
Figure BDA0002592992750000071
wherein D is the distance between the camera lens and the target object; f denotes a focal length of the lens; h is the target surface size height (fixed) of the camera lens; h is the height (known in advance) of the camera lens shooting site.
Due to the characteristics of face recognition, the lenses of the cameras used are basically fixed focus lenses, i.e. the focal length f of the camera lens is known. Therefore, the distance between the target object and the camera can be directly calculated according to the values of f, H and H.
S203, positioning the source of the audio data according to the audio data collected by the audio collector to obtain at least one sound source position.
Because the camera is provided with at least two audio collectors, the sound collected by the audio collectors can be positioned, and optionally, the sound collected by the audio collectors is positioned based on distance difference, energy difference and other methods. For example, two audio collectors can determine the approximate position of the sound source in a two-dimensional plane on the camera monitoring line, and when the number of the audio collectors is increased, the more accurate position of the sound source can be obtained by calculating and superposing. It should be noted that, since there may be multiple sound sources in a monitored scene, all the sound sources in the scene need to be located to obtain at least one sound source position.
And S204, determining a target sound source position corresponding to the target object from at least one sound source position according to the position information and the distance information.
Since at least one sound source position is obtained in S203, to accurately acquire the audio data of the target object, a target sound source position corresponding to the target object needs to be determined from a plurality of sound source positions. Optionally, the coordinate information of the target object and the distance between the target object and the camera determined in S202 are compared with the positions of the sound sources, so as to determine the position of the target sound source corresponding to the target object.
S205, acquiring the audio data of the target object acquired by the audio acquisition unit from the position of the target sound source.
After determining the target sound source position corresponding to the target object, the audio data of the target object may be collected from the target sound source position, and then S206 is performed to identify the audio data of the target object.
S206, performing voice recognition on the audio data of the target object to obtain a second recognition result.
And S207, based on the second recognition result, correcting or supplementing the first recognition result to obtain a final recognition result.
According to the embodiment of the invention, the target sound source position corresponding to the target object is determined according to the position of the target object, the distance between the target object and the camera and the position of the sound source, and the audio data collected from the target sound source position is further acquired, so that the accuracy of acquiring the audio data of the target object is ensured, and the accuracy of correcting the face recognition result based on the voice recognition result is further ensured.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a face recognition enhancing device in a third embodiment of the present invention, where this embodiment is applicable to a case where face recognition is required, and the device may be configured on a camera or a back-end server provided with at least two audio collectors, referring to fig. 3, and the device includes:
theface recognition module 301 is configured to acquire a target image including a target object, and perform face recognition on the target image to obtain a first recognition result;
thevoice recognition module 302 is configured to obtain audio data of the target object collected by the audio collector, and perform voice recognition on the audio data to obtain a second recognition result;
and aresult modification module 303, configured to modify or supplement the first recognition result based on the second recognition result, so as to obtain a final recognition result.
In the embodiment of the invention, after the face recognition is carried out on the target image containing the target object, the voice recognition is carried out on the collected audio data of the target object, and then the face recognition result of the target object is corrected or supplemented based on the voice recognition result of the target object, so that the aim of improving the accuracy of the face recognition is fulfilled.
On the basis of the foregoing embodiment, optionally, the speech recognition module includes:
a position and distance information acquiring unit for acquiring position information of the target object in the target image and distance information of the target object from the camera;
the first positioning unit is used for positioning the source of the audio data according to the audio data collected by the audio collector to obtain at least one sound source position;
the second positioning unit is used for determining a target sound source position corresponding to the target object from at least one sound source position according to the position information and the distance information;
and the voice acquisition unit is used for acquiring the audio data of the target object acquired by the audio acquisition unit from the position of the target sound source.
On the basis of the above embodiment, optionally, the face recognition module is specifically configured to:
inputting the target image into a pre-trained face recognition model, and obtaining a first recognition result according to the output of the face recognition model, wherein the first recognition result comprises the recognition result of at least one biological feature and the confidence coefficient of the recognition result of each biological feature.
On the basis of the above embodiment, optionally, the speech is specifically used by the module to:
and inputting the audio data into a pre-trained voice recognition model, and obtaining a second recognition result according to the output of the voice recognition model, wherein the second recognition result comprises a recognition result of at least one biological characteristic and the confidence coefficient of the recognition result of each biological characteristic.
On the basis of the foregoing embodiment, optionally, the result correction module includes: :
and the result correcting unit is used for comparing at least one biological characteristic and the confidence coefficient thereof in the second recognition result with at least one biological characteristic and the confidence coefficient thereof in the first recognition result, and correcting or supplementing the first recognition result according to the comparison result to obtain a final recognition result.
On the basis of the foregoing embodiment, optionally, the result correction unit is specifically configured to:
for any biological feature shared by the first recognition result and the second recognition result, if the confidence coefficient of the recognition result of the biological feature in the second recognition result is greater than that of the recognition result of the biological feature in the first recognition result, replacing the recognition result of the biological feature in the first recognition result with the recognition result of the biological feature in the second recognition result;
for any biometric feature that is present in the second recognition result and is not present in the first recognition result, the recognition result of the biometric feature is supplemented to the first recognition result.
The face recognition enhancement device provided by the embodiment of the invention can execute the face recognition enhancement method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. Fig. 4 shows a block diagram of an exemplaryelectronic device 12 suitable for implementing an embodiment of the present invention, in this embodiment, the electronic device may be a camera provided with an audio collector, or a backend server. Theelectronic device 12 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 4,electronic device 12 is embodied in the form of a general purpose computing device. The components ofelectronic device 12 may include, but are not limited to: one or more processors orprocessing units 16, asystem memory 28, and abus 18 that couples various system components including thesystem memory 28 and theprocessing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible byelectronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Thesystem memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/orcache memory 32. Theelectronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected tobus 18 by one or more data media interfaces.Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) ofprogram modules 42 may be stored, for example, inmemory 28,such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment.Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device,display 24, etc.), with one or more devices that enable a user to interact withelectronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enableelectronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O)interface 22. Also, theelectronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via thenetwork adapter 20. As shown, thenetwork adapter 20 communicates with other modules of theelectronic device 12 via thebus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction withelectronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Theprocessing unit 16 executes various functional applications and data processing by running the program stored in thesystem memory 28, for example, to implement the face recognition enhancement method provided by the embodiment of the present invention, the method includes:
acquiring a target image containing a target object, and performing face recognition on the target image to obtain a first recognition result;
acquiring audio data of a target object acquired by an audio acquisition device, and performing voice recognition on the audio data to obtain a second recognition result;
and based on the second recognition result, correcting or supplementing the first recognition result to obtain a final recognition result.
EXAMPLE five
The fifth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for enhancing face recognition provided in the fifth embodiment of the present invention, where the method includes:
acquiring a target image containing a target object, and performing face recognition on the target image to obtain a first recognition result;
acquiring audio data of a target object acquired by an audio acquisition device, and performing voice recognition on the audio data to obtain a second recognition result;
and based on the second recognition result, correcting or supplementing the first recognition result to obtain a final recognition result.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

Translated fromChinese
1.一种人脸识别增强方法,其特征在于,所述方法包括:1. a face recognition enhancement method, is characterized in that, described method comprises:获取包含目标对象的目标图像,并对所述目标图像进行人脸识别,得到第一识别结果;obtaining a target image containing the target object, and performing face recognition on the target image to obtain a first recognition result;获取音频采集器采集到的所述目标对象的音频数据,并对所述音频数据进行语音识别,得到第二识别结果;Acquire the audio data of the target object collected by the audio collector, and perform speech recognition on the audio data to obtain a second recognition result;基于所述第二识别结果,对所述第一识别结果进行修正或补充,得到最终的识别结果。Based on the second recognition result, the first recognition result is modified or supplemented to obtain a final recognition result.2.根据权利要求1所述的方法,其特征在于,获取音频采集器采集到的所述目标对象的音频数据,包括:2. The method according to claim 1, wherein obtaining the audio data of the target object collected by the audio collector, comprising:获取所述目标对象在所述目标图像中的位置信息以及所述目标对象与所述摄像机的距离信息;Obtain the position information of the target object in the target image and the distance information between the target object and the camera;根据音频采集器采集的音频数据,对所述音频数据的来源进行定位,得到至少一个声源位置;According to the audio data collected by the audio collector, the source of the audio data is located to obtain at least one sound source position;根据所述位置信息和所述距离信息,从至少一个声源位置中确定所述目标对象对应的目标声源位置;determining a target sound source position corresponding to the target object from at least one sound source position according to the position information and the distance information;获取音频采集器从所述目标声源位置采集到的所述目标对象的音频数据。Acquire the audio data of the target object collected by the audio collector from the target sound source position.3.根据权利要求1所述的方法,其特征在于,对所述目标图像进行人脸识别,得到第一识别结果,包括:3. The method according to claim 1, wherein the target image is subjected to face recognition to obtain a first recognition result, comprising:将所述目标图像输入到预先训练的人脸识别模型,根据所述人脸识别模型的输出得到第一识别结果,其中,所述第一识别结果包括至少一种生物特征的识别结果及各生物特征的识别结果的置信度。The target image is input into a pre-trained face recognition model, and a first recognition result is obtained according to the output of the face recognition model, wherein the first recognition result includes the recognition result of at least one biological feature and each biological feature. The confidence of the recognition result of the feature.4.根据权利要求3所述的方法,其特征在于,对所述音频数据进行语音识别,得到第二识别结果,包括:4. The method according to claim 3, wherein the audio data is subjected to speech recognition to obtain a second recognition result, comprising:将所述音频数据输入到预先训练的语音识别模型,根据所述语音识别模型的输出得到第二识别结果,其中所述第二识别结果包括至少一种生物特征的识别结果及各生物特征的识别结果的置信度。Input the audio data into a pre-trained speech recognition model, and obtain a second recognition result according to the output of the speech recognition model, wherein the second recognition result includes the recognition result of at least one biological feature and the recognition of each biological feature confidence in the results.5.根据权利要求4所述的方法,其特征在于,基于所述第二识别结果,对所述第一识别结果进行修正或补充,得到最终的识别结果,包括:5. The method according to claim 4, wherein, based on the second identification result, modifying or supplementing the first identification result to obtain a final identification result, comprising:将所述第二识别结果中的至少一种生物特征及其置信度,与所述第一识别结果中的至少一种生物特征及其置信度进行比对,根据比对结果对所述第一识别结果进行修正或补充,得到最终的识别结果。Compare at least one biometric feature and its confidence level in the second identification result with at least one biometric feature and its confidence level in the first identification result, and compare the first biometric feature and its confidence level according to the comparison result. The recognition result is corrected or supplemented to obtain the final recognition result.6.根据权利要求5所述的方法,其特征在于,根据比对结果对所述第一识别结果进行修正或补充,包括:6. The method according to claim 5, wherein the first identification result is modified or supplemented according to the comparison result, comprising:针对第一识别结果和第二识别结果中所共有任一生物特征,若第二识别结果中该生物特征的识别结果的置信度大于第一识别结果中该生物特征的识别结果的置信度,则用第二识别结果中该生物特征的识别结果替换第一识别结果中该生物特征的识别结果;For any biological feature shared by the first recognition result and the second recognition result, if the confidence of the recognition result of the biological feature in the second recognition result is greater than the confidence of the recognition result of the biological feature in the first recognition result, then Replace the identification result of the biometric feature in the first identification result with the identification result of the biometric feature in the second identification result;针对存在于第二识别结果中,且不存在于第一识别结果中的任一生物特征,将该生物特征的识别结果补充到第一识别结果中。For any biometric feature that exists in the second identification result and does not exist in the first identification result, the identification result of the biometric feature is added to the first identification result.7.一种人脸识别增强装置,其特征在于,所述装置包括:7. A face recognition enhancement device, wherein the device comprises:人脸识别模块,用于获取包含目标对象的目标图像,并对所述目标图像进行人脸识别,得到第一识别结果;a face recognition module for acquiring a target image containing a target object, and performing face recognition on the target image to obtain a first recognition result;语音识别模块,用于获取音频采集器采集到的所述目标对象的音频数据,并对所述音频数据进行语音识别,得到第二识别结果;a speech recognition module, used for acquiring the audio data of the target object collected by the audio collector, and performing speech recognition on the audio data to obtain a second recognition result;结果修正模块,用于基于所述第二识别结果,对所述第一识别结果进行修正或补充,得到最终的识别结果。A result correction module, configured to correct or supplement the first recognition result based on the second recognition result to obtain a final recognition result.8.根据权利要求7所述的装置,其特征在于,语音识别模块包括:8. The device according to claim 7, wherein the speech recognition module comprises:位置与距离信息获取单元,用于获取所述目标对象在所述目标图像中的位置信息以及所述目标对象与所述摄像机的距离信息;a position and distance information acquisition unit, configured to acquire the position information of the target object in the target image and the distance information between the target object and the camera;第一定位单元,用于根据音频采集器采集的音频数据,对所述音频数据的来源进行定位,得到至少一个声源位置;a first positioning unit, configured to locate the source of the audio data according to the audio data collected by the audio collector to obtain at least one sound source position;第二定位单元,用于根据所述位置信息和所述距离信息,从至少一个声源位置中确定所述目标对象对应的目标声源位置;a second positioning unit, configured to determine a target sound source position corresponding to the target object from at least one sound source position according to the position information and the distance information;语音获取单元,用于获取音频采集器从所述目标声源位置采集到的所述目标对象的音频数据。A voice acquisition unit, configured to acquire the audio data of the target object collected by the audio collector from the position of the target sound source.9.一种电子设备,其特征在于,包括:9. An electronic device, characterized in that, comprising:一个或多个处理器;one or more processors;存储装置,用于存储一个或多个程序,storage means for storing one or more programs,当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的人脸识别增强方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the face recognition enhancement method according to any one of claims 1-6.10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-6中任一所述的人脸识别增强方法。10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the method for enhancing face recognition according to any one of claims 1-6 is implemented.
CN202010700877.7A2020-07-202020-07-20Face recognition enhancement method and device, electronic equipment and storage mediumWithdrawnCN113963396A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202010700877.7ACN113963396A (en)2020-07-202020-07-20Face recognition enhancement method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202010700877.7ACN113963396A (en)2020-07-202020-07-20Face recognition enhancement method and device, electronic equipment and storage medium

Publications (1)

Publication NumberPublication Date
CN113963396Atrue CN113963396A (en)2022-01-21

Family

ID=79459483

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202010700877.7AWithdrawnCN113963396A (en)2020-07-202020-07-20Face recognition enhancement method and device, electronic equipment and storage medium

Country Status (1)

CountryLink
CN (1)CN113963396A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117112795A (en)*2023-07-112023-11-24北京达佳互联信息技术有限公司 Identification name recognition method, device, electronic equipment and storage medium
WO2025130575A1 (en)*2023-12-212025-06-26蔚来汽车科技(安徽)有限公司Identity attribute recognition method, computer readable storage medium and intelligent device

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20190005961A1 (en)*2017-06-282019-01-03Baidu Online Network Technology (Beijing) Co., Ltd.Method and device for processing voice message, terminal and storage medium
CN109903392A (en)*2017-12-112019-06-18北京京东尚科信息技术有限公司 Augmented reality method and device
CN110321863A (en)*2019-07-092019-10-11北京字节跳动网络技术有限公司Age recognition methods and device, storage medium
CN110503045A (en)*2019-08-262019-11-26北京华捷艾米科技有限公司A kind of Face detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US20190005961A1 (en)*2017-06-282019-01-03Baidu Online Network Technology (Beijing) Co., Ltd.Method and device for processing voice message, terminal and storage medium
CN109903392A (en)*2017-12-112019-06-18北京京东尚科信息技术有限公司 Augmented reality method and device
CN110321863A (en)*2019-07-092019-10-11北京字节跳动网络技术有限公司Age recognition methods and device, storage medium
CN110503045A (en)*2019-08-262019-11-26北京华捷艾米科技有限公司A kind of Face detection method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN117112795A (en)*2023-07-112023-11-24北京达佳互联信息技术有限公司 Identification name recognition method, device, electronic equipment and storage medium
WO2025130575A1 (en)*2023-12-212025-06-26蔚来汽车科技(安徽)有限公司Identity attribute recognition method, computer readable storage medium and intelligent device

Similar Documents

PublicationPublication DateTitle
JP7690651B2 (en) Speaker diarization using speaker embeddings and trained generative models
WO2020215974A1 (en)Human body detection method and device
CN110232340B (en)Method and device for establishing video classification model and video classification
CN114972929A (en) A pre-training method and device for a medical multimodal model
CN108388649B (en)Method, system, device and storage medium for processing audio and video
CN111815748B (en)Animation processing method and device, storage medium and electronic equipment
CN113780326B (en) Image processing method, device, storage medium and electronic device
CN110796108A (en)Method, device and equipment for detecting face quality and storage medium
CN109947971A (en) Image retrieval method, device, electronic device and storage medium
CN113963396A (en)Face recognition enhancement method and device, electronic equipment and storage medium
CN114359361A (en)Depth estimation method, depth estimation device, electronic equipment and computer-readable storage medium
CN116433692A (en) A medical image segmentation method, device, equipment and storage medium
CN109829383B (en)Palmprint recognition method, palmprint recognition device and computer equipment
CN117912085B (en)Model training method, face key point positioning method, device, equipment and medium
CN116824333B (en) A nasopharyngeal carcinoma detection system based on deep learning model
CN115049546B (en) Sample data processing method, device, electronic device and storage medium
CN112001300A (en)Building monitoring method and device based on cross entropy according to position and electronic equipment
CN113031600B (en)Track generation method and device, storage medium and electronic equipment
CN114170309B (en) Sperm tracking method, device, electronic device and storage medium
CN116363734A (en)Face tracking method, electronic equipment and storage medium
CN114898419B (en) Method, device, medium and computing equipment for extracting key images from image sequence
CN115761317B (en)Image classification method, device, electronic equipment and storage medium
Sur et al.Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation
CN113971742B (en)Key point detection, model training and live broadcasting methods, devices, equipment and media
CN114639045B (en) A fall detection method, a fall detection device, a computer device, and a medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
WW01Invention patent application withdrawn after publication
WW01Invention patent application withdrawn after publication

Application publication date:20220121


[8]ページ先頭

©2009-2025 Movatter.jp