wherein D is the distance between the camera lens and the target object; f denotes a focal length of the lens; h is the target surface size height (fixed) of the camera lens; h is the height (known in advance) of the camera lens shooting site.

Due to the characteristics of face recognition, the lenses of the cameras used are basically fixed focus lenses, i.e. the focal length f of the camera lens is known. Therefore, the distance between the target object and the camera can be directly calculated according to the values of f, H and H.

S203, positioning the source of the audio data according to the audio data collected by the audio collector to obtain at least one sound source position.

Because the camera is provided with at least two audio collectors, the sound collected by the audio collectors can be positioned, and optionally, the sound collected by the audio collectors is positioned based on distance difference, energy difference and other methods. For example, two audio collectors can determine the approximate position of the sound source in a two-dimensional plane on the camera monitoring line, and when the number of the audio collectors is increased, the more accurate position of the sound source can be obtained by calculating and superposing. It should be noted that, since there may be multiple sound sources in a monitored scene, all the sound sources in the scene need to be located to obtain at least one sound source position.

And S204, determining a target sound source position corresponding to the target object from at least one sound source position according to the position information and the distance information.

Since at least one sound source position is obtained in S203, to accurately acquire the audio data of the target object, a target sound source position corresponding to the target object needs to be determined from a plurality of sound source positions. Optionally, the coordinate information of the target object and the distance between the target object and the camera determined in S202 are compared with the positions of the sound sources, so as to determine the position of the target sound source corresponding to the target object.

S205, acquiring the audio data of the target object acquired by the audio acquisition unit from the position of the target sound source.

After determining the target sound source position corresponding to the target object, the audio data of the target object may be collected from the target sound source position, and then S206 is performed to identify the audio data of the target object.

S206, performing voice recognition on the audio data of the target object to obtain a second recognition result.

And S207, based on the second recognition result, correcting or supplementing the first recognition result to obtain a final recognition result.

According to the embodiment of the invention, the target sound source position corresponding to the target object is determined according to the position of the target object, the distance between the target object and the camera and the position of the sound source, and the audio data collected from the target sound source position is further acquired, so that the accuracy of acquiring the audio data of the target object is ensured, and the accuracy of correcting the face recognition result based on the voice recognition result is further ensured.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a face recognition enhancing device in a third embodiment of the present invention, where this embodiment is applicable to a case where face recognition is required, and the device may be configured on a camera or a back-end server provided with at least two audio collectors, referring to fig. 3, and the device includes:

theface recognition module 301 is configured to acquire a target image including a target object, and perform face recognition on the target image to obtain a first recognition result;

thevoice recognition module 302 is configured to obtain audio data of the target object collected by the audio collector, and perform voice recognition on the audio data to obtain a second recognition result;

and aresult modification module 303, configured to modify or supplement the first recognition result based on the second recognition result, so as to obtain a final recognition result.

On the basis of the foregoing embodiment, optionally, the speech recognition module includes:

a position and distance information acquiring unit for acquiring position information of the target object in the target image and distance information of the target object from the camera;

the first positioning unit is used for positioning the source of the audio data according to the audio data collected by the audio collector to obtain at least one sound source position;

the second positioning unit is used for determining a target sound source position corresponding to the target object from at least one sound source position according to the position information and the distance information;

and the voice acquisition unit is used for acquiring the audio data of the target object acquired by the audio acquisition unit from the position of the target sound source.

On the basis of the above embodiment, optionally, the face recognition module is specifically configured to:

On the basis of the above embodiment, optionally, the speech is specifically used by the module to:

and inputting the audio data into a pre-trained voice recognition model, and obtaining a second recognition result according to the output of the voice recognition model, wherein the second recognition result comprises a recognition result of at least one biological characteristic and the confidence coefficient of the recognition result of each biological characteristic.

On the basis of the foregoing embodiment, optionally, the result correction module includes: :

and the result correcting unit is used for comparing at least one biological characteristic and the confidence coefficient thereof in the second recognition result with at least one biological characteristic and the confidence coefficient thereof in the first recognition result, and correcting or supplementing the first recognition result according to the comparison result to obtain a final recognition result.

On the basis of the foregoing embodiment, optionally, the result correction unit is specifically configured to:

for any biological feature shared by the first recognition result and the second recognition result, if the confidence coefficient of the recognition result of the biological feature in the second recognition result is greater than that of the recognition result of the biological feature in the first recognition result, replacing the recognition result of the biological feature in the first recognition result with the recognition result of the biological feature in the second recognition result;

for any biometric feature that is present in the second recognition result and is not present in the first recognition result, the recognition result of the biometric feature is supplemented to the first recognition result.

The face recognition enhancement device provided by the embodiment of the invention can execute the face recognition enhancement method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example four

Fig. 4 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present invention. Fig. 4 shows a block diagram of an exemplaryelectronic device 12 suitable for implementing an embodiment of the present invention, in this embodiment, the electronic device may be a camera provided with an audio collector, or a backend server. Theelectronic device 12 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.

As shown in FIG. 4,electronic device 12 is embodied in the form of a general purpose computing device. The components ofelectronic device 12 may include, but are not limited to: one or more processors orprocessing units 16, asystem memory 28, and abus 18 that couples various system components including thesystem memory 28 and theprocessing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible byelectronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Thesystem memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/orcache memory 32. Theelectronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected tobus 18 by one or more data media interfaces.Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) ofprogram modules 42 may be stored, for example, inmemory 28,such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment.Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device,display 24, etc.), with one or more devices that enable a user to interact withelectronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enableelectronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O)interface 22. Also, theelectronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via thenetwork adapter 20. As shown, thenetwork adapter 20 communicates with other modules of theelectronic device 12 via thebus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction withelectronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Theprocessing unit 16 executes various functional applications and data processing by running the program stored in thesystem memory 28, for example, to implement the face recognition enhancement method provided by the embodiment of the present invention, the method includes:

EXAMPLE five

The fifth embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for enhancing face recognition provided in the fifth embodiment of the present invention, where the method includes:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.