CN112202655B

Movatterモバイル変換

Info

Publication number: CN112202655B
Application number: CN202011011591.4A
Authority: CN
Inventors: 马景林; 王宝林; 孙宇飞
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2023-03-24
Anticipated expiration: 2040-09-23
Also published as: CN112202655A

Abstract

The application discloses intelligent electrical apparatus, image recognition method, electronic equipment and storage medium, intelligent electrical apparatus includes: the lighting device comprises a base, a connecting mechanism and a lighting module, wherein one end of the connecting mechanism is arranged on the base, and the other end of the connecting mechanism is connected with the lighting module; the intelligent electric appliance further comprises a voice recognition module and a camera module, the voice recognition module is arranged on the connecting mechanism and/or the lighting module, the camera module is arranged on the connecting mechanism and/or the lighting module, the voice recognition module is used for collecting voice information and recognizing the collected voice information, the camera module is used for acquiring images in an image collection area of the camera module according to voice recognition results generated by the voice recognition module and recognizing the collected images, and the intelligent degree of the intelligent electric appliance can be improved by the scheme.

Description

Intelligent electric appliance, image recognition method, electronic device and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to an intelligent electric appliance, an image recognition method, electronic equipment and a storage medium.

Background

With the development of science and technology, more and more intelligent home devices enter the lives of people, a user can be connected with a cloud end through mobile phone software (APP), and sends a control instruction to the intelligent devices through the cloud end, so that the remote control function of the mobile phone is achieved, the cloud end is a software platform adopting an Application Virtualization technology (Application Virtualization), and the cloud technology integrates multiple functions of software searching, downloading, using, managing, backing up and the like.

However, the current smart device mainly relies on the user to control the device through the mobile phone, for example, for a desk lamp integrated with a camera, the user needs to perform complicated operations in the terminal device to complete the task of image recognition of the desk lamp, and thus, the current smart device is inconvenient for the user to operate.

Disclosure of Invention

The application provides an intelligent electric appliance, an image recognition method, an electronic device and a storage medium, which can improve the intelligent degree of the intelligent electric appliance.

The application provides an intelligence electrical apparatus, include: the lighting device comprises a base, a connecting mechanism and a lighting module, wherein one end of the connecting mechanism is arranged on the base, and the other end of the connecting mechanism is connected with the lighting module;

the connecting mechanism is further provided with a voice recognition module and a camera module, the voice recognition module is used for collecting voice information and recognizing the collected voice information, and the camera module is used for acquiring an image in an image collection area of the camera module according to a voice recognition result generated by the voice recognition module and recognizing the collected image.

Optionally, in some embodiments of the present application, the connection mechanism includes a connection portion and a support portion, the support portion is disposed on the base, the connection portion has a first end and a second end disposed oppositely, the first end is disposed on the support portion, and the second end is connected with the lighting module.

Optionally, in some embodiments of the application, a display module is further disposed on a side of the support portion facing the lighting module, and the display module is configured to display prompt information corresponding to an image recognition result according to the image recognition result generated by the camera module.

Optionally, in some embodiments of the present application, the camera module includes a first camera and a second camera, and the voice recognition module includes a first voice recognition unit, a second voice recognition unit, and a third voice recognition unit;

the first camera is arranged on one side face, facing the base, of the connecting part and is used for acquiring images in an image acquisition area of the first camera; the second camera is arranged on one side surface of the support part facing the lighting module and is used for acquiring images in an image acquisition area of the second camera;

the first voice recognition unit is arranged on one side of the base towards the connecting part, the second voice recognition unit is arranged on one side of the lighting module towards the supporting part, and the third voice recognition unit is arranged on one side of the lighting module towards the base.

Optionally, in some embodiments of the application, the first camera is specifically configured to acquire an image located in an image acquisition area of the first camera according to a voice recognition result generated by the first voice recognition unit and/or the third voice recognition unit, and recognize a text in the acquired image.

Optionally, in some embodiments of the application, the display module is specifically configured to display a prompt message corresponding to the text recognition result.

Optionally, in some embodiments of the application, the second camera is specifically configured to acquire an image located in an image acquisition area of the second camera according to a voice recognition result generated by the second voice recognition unit and/or the third voice recognition unit, and identify a human body in the acquired image.

Optionally, in some embodiments of the application, the display module is specifically configured to display prompt information corresponding to the human body gesture recognition result.

Correspondingly, the application also provides an image identification method, which is applied to the intelligent electric appliance, and the method comprises the following steps:

the voice recognition module collects voice information when an operation triggered for the lighting module is detected;

the voice recognition module recognizes the collected voice information, generates a voice recognition result of the voice information, and sends the voice recognition result to the camera module;

the camera module acquires an image in an image acquisition area of the camera module based on the voice recognition result, and recognizes the acquired image to obtain an image recognition result of the image.

The application provides an intelligent electrical apparatus and an image recognition method, the intelligent electrical apparatus includes: the lighting device comprises a base, a connecting mechanism and a lighting module, wherein one end of the connecting mechanism is arranged on the base, and the other end of the connecting mechanism is connected with the lighting module;

the intelligent electric appliance further comprises a voice recognition module and a camera module, the voice recognition module is arranged on the connecting mechanism and/or the lighting module, the camera module is arranged on the connecting mechanism and/or the lighting module, the voice recognition module is used for collecting voice information and recognizing the collected voice information, and the camera module is used for acquiring images in an image collection area of the camera module according to voice recognition results generated by the voice recognition module and recognizing the collected images.

The application provides a novel intelligent electrical apparatus, integrated speech recognition module and camera module in intelligent electrical apparatus, when the user carries out the image recognition task through intelligent electrical apparatus, need not carry out loaded down with trivial details operation in terminal equipment, speech recognition module can discern user's speech information, the camera module is according to the speech recognition result that the speech recognition module generated, acquire the image that is located the image acquisition region of camera module, and discern the image of gathering, thereby realize the image recognition to the target image, consequently, the intelligent degree of intelligent electrical apparatus has been improved.

Drawings

In order to more clearly illustrate the technical solutions in the present application, the drawings required for the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive effort.

FIG. 1a is a schematic diagram illustrating a first perspective of a smart appliance provided herein;

fig. 1b is a schematic structural diagram of a second view angle of the smart appliance provided in the present application;

fig. 1c is another schematic structural diagram of a first view angle of the smart appliance provided in the present application;

FIG. 1d is a schematic diagram of a first perspective of the smart appliance provided herein;

FIG. 1e is a schematic flow chart of an image recognition method provided herein;

FIG. 2a is another schematic flow chart of an image recognition method provided in the present application;

FIG. 2b is a schematic diagram of an image recognition system provided in the present application;

fig. 3 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the present application will be described clearly and completely with reference to the accompanying drawings in the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following are detailed below. It should be noted that the description sequence of the following embodiments is not intended to limit the priority sequence of the embodiments.

The application provides an intelligent electric appliance, an image recognition method, an electronic device and a storage medium.

Referring to fig. 1a to 1d, the intelligentelectrical appliance 1 includes abase 10, aconnecting mechanism 20, alighting module 30, avoice recognition module 40, and acamera module 50, wherein one end of theconnecting mechanism 20 is disposed on thebase 10, the other end of theconnecting mechanism 20 is connected to thelighting module 30, thevoice recognition module 40 is configured to collect voice information and recognize the collected voice information, thecamera module 50 is configured to acquire an image located in an image collection area of thecamera module 50 according to a voice recognition result generated by the voice recognition module and recognize the collected image, thevoice recognition module 40 may be disposed only on theconnecting mechanism 20 or only on thelighting module 30, and thevoice recognition module 40 may be disposed on both theconnecting mechanism 20 and thelighting module 30, and similarly, thecamera module 50 may be disposed only on theconnecting mechanism 20 or only on thelighting module 30, and thecamera module 50 may be disposed on both theconnecting mechanism 20 and thelighting module 30, that is, that thevoice recognition module 40 is disposed on theconnecting mechanism 20 and/or thelighting module 30, and thecamera module 50 is disposed on theconnecting mechanism 20 or thelighting module 30, as shown in fig. 1a and theconnecting mechanism 20 and/or the lighting module 50 b.

Thebase 10 and theattachment mechanism 20 may be formed of plastic, glass, ceramic, fiber composite, metal (e.g., stainless steel, aluminum, etc.), other suitable materials, or a combination of any two or more of these materials. Theconnection mechanism 20 may be formed using a one-piece configuration in which some or all of theconnection mechanism 20 is machined or molded as a single structure, although theconnection mechanism 20 may be formed using a plurality of structures, such as the structures shown in fig. 1a and 1b, theconnection mechanism 20 may include aconnection portion 201 and asupport portion 202, thesupport portion 202 being disposed on thebase 10, theconnection portion 201 having first andsecond ends 201a and 201b disposed opposite each other, thefirst end 201a being disposed on thesupport portion 202, the second end 201b being connected to thelighting module 50.

In order to improve the interactivity between the user and the smartelectrical appliance 1, please continue to refer to fig. 1a and 1b, in some embodiments, adisplay module 60 is further disposed on a side of thesupport 202 facing thelighting module 50, and thedisplay module 60 may be configured to display a prompt corresponding to the image recognition result according to the image recognition result generated by thecamera module 50, for example, thecamera module 50 collects an image of the user and recognizes that the user is a pre-added user XX, thedisplay module 60 may display the image recognition result in a text form, for example, thedisplay module 60 may display "XX, your good" in a chinese form in its corresponding display screen, and when the user writes an article using the smartelectrical appliance 1, thecamera module 50 recognizes that the sitting posture of the user does not correspond to the preset sitting posture, thedisplay module 60 may display "please adjust your sitting posture" in its corresponding display screen, and the user may control the smartelectrical appliance 1 to turn on and off by voice, for example, the voice recognition module recognizes that the user is "adjust your sitting posture" low "in the smartelectrical appliance 1", and then the brightness of the smartelectrical appliance 30 is reduced.

Thedisplay module 60 may include a two-dimensional display or a three-dimensional display, where the three-dimensional display is also called a stereoscopic display, and the stereoscopic display is a new generation of autostereoscopic display device built on a mechanism of stereoscopic vision of human eyes. The stereoscopic display can obtain images with complete depth information by using a multichannel automatic stereoscopic reality technology without any vision-aiding equipment, adopts a microlens grating screen or a lens screen technology, is accurately aligned by a moire interferometry method, and provides different perspective images for two eyes by utilizing a group of obliquely arranged convex lens arrays and refraction only in the horizontal direction to realize a stereoscopic effect. Thedisplay module 60 may be externally hung on the supportingportion 202, or may be embedded in the supportingportion 202, as shown in fig. 1a and 1 b. In the embedded form, for example, during actual manufacturing, a groove may be formed on one side surface of the supportingportion 202, and then thedisplay module 60 is placed in the groove to form the structure shown in fig. 1a and 1 b. In addition, the groove is adapted to the structure of thedisplay module 60, for example, the structure of thedisplay module 60 is a rectangular parallelepiped, the cross section of the groove is rectangular, and the depth of the groove is the same as the thickness of thedisplay module 60.

Optionally, in some embodiments, thecamera module 50 includes athird camera 503, the 40 voice recognition module includes a fourthvoice recognition unit 404, please refer to fig. 1c in combination with fig. 1b, thecamera module 50 includes athird camera 503, the 40 voice recognition module includes a fourthvoice recognition unit 404, thelighting module 30 includes alighting unit 301 and ahousing 302, wherein thehousing 302 is connected to the second end 201b of the connecting portion, thehousing 32 has a receiving space, thelighting unit 301 is disposed in the receiving space, thethird camera 503 and the fourthvoice recognition unit 404 are disposed on thehousing 302, and thethird camera 503 is disposed toward thebase 10, that is, thesmart appliance 1 is disposed with three cameras and four voice recognition units. Of course, the intelligentelectrical appliance 1 may also be provided with only thethird camera 503 and the fourthvoice recognition unit 404, as shown in fig. 1d, please refer to the foregoing embodiment for the specific connection relationship, which is not described herein again.

Key technologies for Speech Technology (Speech Technology) are automatic Speech recognition Technology (ASR) and Speech synthesis Technology (TTS), as well as voiceprint recognition Technology. The computer can listen, see, speak and feel, and the development direction of the future human-computer interaction is provided, wherein the voice becomes one of the best viewed human-computer interaction modes in the future.

The speech recognition tasks can be roughly classified into 3 categories, i.e., isolated word recognition (isolated word recognition), keyword recognition (or keyword detection), and continuous speech recognition, according to the recognized objects. The task of the isolated word recognition is to recognize previously known isolated words, such as 'startup', 'shutdown', and the like; the task of continuous speech recognition is to recognize any continuous speech, such as a sentence or a segment of speech; keyword detection in a continuous speech stream is for continuous speech, but it does not recognize all words, but only detects where known keywords appear, such as detecting the words "computer", "world" in a passage of speech.

Speech recognition techniques can be divided into person-specific speech recognition, which can recognize only one or a few persons, and person-unspecific speech recognition, which can be used by anyone, depending on the speaker in question. Clearly, a non-human specific speech recognition system is more practical, but it is much more difficult to recognize than for a specific human.

The application field of speech recognition is very wide, and common application systems are as follows: compared with a keyboard input method, the voice input system is more in line with the daily habits of people, and is more natural and more efficient; the voice control system, namely, the operation of the equipment is controlled by voice, is more rapid and convenient compared with manual control, and can be used in a plurality of fields such as industrial control, voice dialing system, intelligent household appliances, voice control intelligent toys and the like; the intelligent dialogue inquiry system operates according to the voice of the client, and provides natural and friendly database retrieval services for the user, such as family service, hotel service, travel agency service system, ticket booking system, medical service, bank service, stock inquiry service and the like.

In the present application, after thecamera module 50 obtains the target image (i.e. the image in the image capturing area) according to the recognition result of thevoice recognition module 40, thecamera module 50 recognizes the target image to complete the image recognition task, for example, identify the user image of the user in the image capturing area of the camera module to obtain the identity information of the user, and for example, recognize the gesture image in the image capturing area of the camera module to obtain the voice corresponding to the gesture image, compared with the existing intelligent electrical appliance, the intelligent electrical appliance provided in the present application can recognize the voice information sent by the user by means of thevoice recognition module 40 to complete the image recognition task, and the user does not need to perform a cumbersome operation with a cloud terminal device to complete the image recognition task, for example, the intelligent electrical appliance provided in the present application can improve the efficiency of image recognition.

Optionally, in some embodiments, thedisplay module 50 is specifically configured to display prompt information corresponding to the text recognition result.

Optionally, in some embodiments, thedisplay module 50 is specifically configured to display prompt information corresponding to the human body gesture recognition result.

It should be noted that thefirst camera 501 may recognize the text image by using a neural network algorithm, where the neural network system is a complex network system formed by a large number of simple processing units (called neurons) and widely connected in some way, and although the structure and function of each neuron are very simple, the behavior of the network system formed by a large number of neurons is colorful and very complex. It reflects many basic features of human brain function, and is a simplification, abstraction and simulation of human brain neural network system. The syntactic method focuses on simulating the logical thinking of a human, while the neural network focuses on the sensory perception process, the visual thinking, the distributed memory and the self-learning self-organizing process in the process of simulating and realizing the cognition of the human, and is a complementary relation with the symbol processing. The neural network has the advantages of nonlinear mapping approximation, large-scale parallel distributed storage and comprehensive optimization processing, strong fault tolerance, unique associative memory and self-organization, self-adaptation and self-learning capabilities, and is particularly suitable for processing the problems of simultaneously considering a plurality of factors and conditions and information uncertainty (ambiguity or inaccuracy).

Thesecond camera 502 may adopt an existing infrared sensor scheme, a Time of flight (TOF) scheme, an ultrasonic detection scheme, or the like, and is responsible for detecting a hand movement trajectory and a gesture state of a hand of a user in a display area, wherein the infrared sensor is a sensor capable of sensing infrared rays radiated by a target and measuring by using physical properties of the infrared rays. The detector can be divided into a photon detector and a thermal detector according to a detection mechanism, wherein the photon detector utilizes interaction between photon flow of incident light radiation and electrons in a detector material so as to change the energy state of the electrons and cause various electrical phenomena; the thermal detector utilizes the thermal effect of infrared radiation, a sensitive element of the detector absorbs the radiant energy to cause temperature rise, so that certain related physical parameters are changed, and the infrared radiation absorbed by the detector is determined by measuring the change of the physical parameters; the so-called time-of-flight method is a method in which a light pulse is continuously transmitted to a target, and then light returning from the object is received by a sensor, and the distance to the target is obtained by detecting the time of flight (round trip) of the light pulse. The principle of the technology is basically similar to that of a 3D laser sensor, except that the 3D laser sensor scans point by point, and a TOF camera obtains depth information of the whole image at the same time. The TOF camera is similar to the common machine vision imaging process, and consists of a light source, an optical component, a sensor, a control circuit, a processing circuit and other units. TOF cameras have fundamentally different 3D imaging mechanisms compared to binocular measurement systems which are very similar to those belonging to the field of non-invasive three-dimensional detection and application. Binocular stereo measurement is performed by matching left and right stereo pairs and then performing stereo detection through a triangulation method, and a TOF camera acquires a target distance acquired through incident light and reflected light detection. The TOF technology adopts an active light detection mode, and unlike general illumination requirements, a TOF irradiation unit aims at distance measurement by using changes of incident light signals and reflected light signals instead of illumination, so that the TOF irradiation unit emits light after high-frequency modulation, for example, pulsed light emitted by an LED or a laser diode, and pulses can reach 100MHz. Similar to a normal camera, the front end of the TOF camera chip needs a lens for collecting light. However, unlike conventional optical lenses, a bandpass filter is required to ensure that only light with the same wavelength as the illumination light source enters. Meanwhile, the optical imaging system has a perspective effect, and scenes with different distances are concentric spherical surfaces with different diameters instead of parallel planes, so that a subsequent processing unit is required to correct the error in actual use. As the core of the camera of the TOF, each pixel of the TOF chip records the phase of incident light between the camera and the object. The sensor structure is similar to a conventional image sensor, but more complex than an image sensor, and it includes 2 or more shutters for sampling the reflected light at different times. For this reason, TOF chip pixels are much larger than typical image sensor pixel sizes, typically around 100 um. Both the illumination unit and the TOF sensor require high-speed signal control, so that a high depth measurement accuracy can be achieved. For example, a shift of 10ps in the synchronization signal between the illumination light and the TOF sensor corresponds to a displacement of 1.5 mm. And the current CPU can reach 3GHz, the corresponding clock period is 300ps, and the corresponding depth resolution is 45mm. The operation unit mainly completes data correction and calculation work, distance information can be obtained by calculating the relative phase shift relation of incident light and reflected light, the wavelength of ultrasonic waves is shorter than that of common sound waves, the ultrasonic waves have better directivity, and opaque substances can penetrate through the ultrasonic waves. Ultrasound imaging is a technique for rendering an internal image of an opaque object using ultrasound. The ultrasonic wave emitted from the transducer is focused on the opaque sample by the acoustic lens, the ultrasonic wave transmitted from the sample carries the information of the irradiated part (such as the capability of reflecting, absorbing and scattering the acoustic wave), and is focused on the piezoelectric receiver by the acoustic lens, the obtained electric signal is input into the amplifier, and the image of the opaque sample can be displayed on the fluorescent screen by the scanning system. The above device is called an ultrasonic microscope. Ultrasonic imaging techniques have gained widespread use in medical inspection, for inspecting large scale integrated circuits in the microelectronic device manufacturing industry, and for displaying areas and grain boundaries of different compositions in alloys, etc. in material science. Acoustic holography is an acoustic imaging technique for recording and reproducing a stereoscopic image of an opaque object using the interference principle of ultrasonic waves, and its principle is basically the same as that of light wave holography, except that the recording means is different (see holography). Two transducers placed in a liquid are excited with the same ultrasonic signal source, which respectively emit two coherent ultrasonic waves: one beam passes through the object under study to become the object wave, and the other beam serves as the reference wave. The object wave and the reference wave are overlapped on the liquid surface to form an acoustic hologram, the acoustic hologram is irradiated by a laser beam, and a reproduced image of the object is obtained by utilizing a diffraction effect generated when the laser is reflected on the acoustic hologram, and is usually observed in real time by a camera and a television.

The application provides an intelligent electrical apparatus, at integratedspeech recognition module 40 andcamera module 50 in intelligentelectrical apparatus 1, when the user carries out the image recognition task through intelligentelectrical apparatus 1, need not carry out loaded down with trivial details operation in terminal equipment,speech recognition module 40 can discern user's speech information,camera module 50 is according to the speech recognition result that the speech recognition module generated, acquire the image that is locatedcamera module 50's image acquisition region, and discern the image of gathering, thereby realize the image recognition to the target image, therefore, the intelligent degree of intelligentelectrical apparatus 1 has been improved, and then improve the image recognition efficiency of intelligent electrical apparatus.

Correspondingly, the application also provides an image identification method, which is applied to the intelligent electric appliance and comprises the following steps: when the operation triggered by the lighting module is detected, the voice recognition module collects voice information, the voice recognition module recognizes the collected voice information, a voice recognition result of the voice information is generated, the voice recognition result is sent to the camera module, the camera module acquires an image in an image collection area of the camera module based on the voice recognition result, the collected image is recognized, and an image recognition result of the image is obtained.

Referring to fig. 1e, fig. 1e is a schematic flow chart of an image recognition method provided in the present application. The specific flow of the three-dimensional image recognition method can be as follows:

101. when an operation triggered for the lighting module is detected, the voice recognition module collects voice information.

For example, specifically, when an operation triggered by a user for the lighting module is detected, for example, the user turns on a power supply of the smart appliance to operate the lighting module, the voice recognition module collects voice information output by the user.

102. The voice recognition module recognizes the collected voice information, generates a voice recognition result of the voice information, and sends the voice recognition result to the camera module.

For example, specifically, the speech recognition module first extracts speech from the speech information, then performs framing processing on the extracted speech, that is, the extracted speech is divided into a plurality of pieces of audio, then extracts audio features of each audio, optionally, the audio features may be Mel-Frequency cepstrum Coefficients (MFCCs) features, which can reflect human perception characteristics of the speech, and finally, the speech recognition module calls a preset acoustic model and a language model to recognize the extracted audio features, so as to obtain a speech recognition result of the speech information.

103. The camera module acquires an image in an image acquisition area of the camera module based on the voice recognition result, and recognizes the acquired image to obtain an image recognition result of the image.

In this application, in consideration of the structure of a common desk lamp, the text recognition task and the human body recognition task are separated, that is, the camera module includes a first camera and a second camera, where the first camera executes the text recognition task and the second camera executes the human body recognition task, that is, optionally, the step "the camera module obtains an image located in an image acquisition area of the camera module based on a voice recognition result, and recognizes the acquired image to obtain an image recognition result of the image" may specifically include:

the first camera acquires an image in an image acquisition area of the first camera according to the voice recognition result, and recognizes a text in the acquired image;

and the second camera acquires an image in the image acquisition area of the second camera according to the voice recognition result, and recognizes the human body in the acquired image.

In the text recognition task, when the first camera acquires a text image to be recognized, recognizing a semantic corresponding to each text element in the text image to be recognized, and then superimposing a semantic recognition result of each text element to obtain a semantic recognition result of the text to be recognized, that is, optionally, in some embodiments, the step "the first camera acquires an image located in an image acquisition area of the first camera according to a voice recognition result, and recognizes a text in the acquired image" may specifically include:

(11) When the voice recognition result indicates that the image recognition task type is text image recognition, the first camera acquires an image in an image acquisition area of the first camera to obtain a text image to be recognized;

(12) Identifying the corresponding semantics of each text element in the text image to be identified;

(13) The semantic recognition results of all the text elements are superposed to obtain the semantic recognition result of the text to be recognized

The text image to be recognized comprises a text to be detected, the text to be detected comprises a plurality of text elements, wherein the text elements refer to various elements in the text to be detected, such as characters, symbols and the like, for example, specifically, the first camera can perform semantic segmentation on the image to be detected to obtain pixel points corresponding to each text element, wherein the semantic segmentation of the image is to perform pixel-level recognition and segmentation to obtain category information and accurate position information of an object in the image, then, the first camera classifies the pixel points of the text image to be recognized by adopting a text detection model to determine the pixel points belonging to the text elements, and finally, the first camera recognizes the text semantics of the text to be detected based on the determined pixel points.

And in the human body recognition task, specifically including tasks such as gesture recognition, human gesture recognition and face recognition, for the intelligent electrical apparatus of this application, the user can pass through gesture control intelligent electrical apparatus, that is, optionally, in some embodiments, step "the second camera identifies the human body in the image of gathering", specifically may include:

(21) When the voice recognition result indicates that the image recognition task type is gesture image recognition, the second camera acquires an image set of gesture operation triggered by a user aiming at the second camera to obtain a gesture image set;

(22) And the second camera identifies gesture operation triggered by the user aiming at the second camera according to the gesture image set.

For example, the second camera may obtain a group of images or a video stream corresponding to a gesture operation triggered by the user for the second camera, and then the second camera recognizes a gesture of each frame of image, so as to complete recognition of the gesture operation triggered by the user for the second camera.

Further, the second camera may identify a gesture operation type of each gesture image in the gesture images, and then select a reference gesture image corresponding to the identified gesture operation type from a preset gesture library to obtain a reference gesture image matched with each gesture image, that is, optionally, in some embodiments, the step "the second camera identifies, according to the gesture image set, a gesture operation triggered by the user with respect to the second camera" may specifically include:

(31) The second camera selects a reference gesture image matched with each gesture image in the gesture image set from a preset gesture library;

(32) And the second camera identifies the gesture operation triggered by the user aiming at the second camera based on the reference gesture image matched with each gesture image.

After the reference gesture images matched with the gesture images are obtained, the second camera can determine the meaning expressed by the user for the gesture operation triggered by the second camera according to the sequence of the gesture images in the gesture image set and the reference gesture images matched with the gesture images.

In addition, the smart appliance may also recognize a body posture (human body posture) of the user, that is, optionally, in some embodiments, the step "the second camera recognizes a human body in the acquired image" may specifically include:

(41) When the voice recognition result indicates that the image recognition task type is human body posture image recognition, the second camera acquires a current human body posture image of the user;

(42) Selecting a reference human body posture image matched with the human body posture image from a preset human body posture library by a second camera;

(43) And determining the human body posture type corresponding to the reference human body posture image as the human body posture type of the current human body posture image of the user.

For example, specifically, when the posture of the user does not satisfy the preset condition, for example, when the user writes an article under the intelligent electrical appliance, and when the intelligent electrical appliance recognizes that the sitting posture of the user does not match the sitting posture pre-stored by the intelligent electrical appliance, the display module may display a prompt message of "please adjust the sitting posture" so that the user can adjust the sitting posture of the user.

According to the image recognition method, when a user executes an image recognition task through the intelligent electric appliance, complex operation does not need to be carried out in the terminal equipment, the voice recognition module can recognize voice information of the user, the camera module acquires an image located in an image acquisition area of the camera module according to a voice recognition result generated by the voice recognition module, and recognizes the acquired image, so that image recognition of a target image is achieved, therefore, the intelligent degree of the intelligent electric appliance is improved, and further the image recognition efficiency of the intelligent electric appliance is improved.

The method according to the examples is further described in detail below by way of example.

In this embodiment, the image recognition method will be described as being specifically integrated in an intelligent electrical appliance.

Referring to fig. 2a, an image recognition method may include the following specific steps:

201. when the intelligent electric appliance detects the operation triggered by the lighting module, the voice recognition module of the intelligent electric appliance collects voice information.

For example, specifically, when the smart appliance detects an operation triggered by a user for the lighting module, for example, the user turns on a power supply of the smart appliance so that the lighting module operates, the voice recognition module of the smart appliance collects voice information output by the user.

202. The voice recognition module of the intelligent electric appliance recognizes the collected voice information, generates a voice recognition result of the voice information, and sends the voice recognition result to the camera module.

For example, specifically, the speech recognition module of the smart electrical appliance first extracts speech from the speech information, then performs framing processing on the extracted speech, and finally calls a preset acoustic model and a language model to recognize the extracted audio features to obtain a speech recognition result of the speech information.

203. The camera module of the intelligent electric appliance acquires the image in the image acquisition area of the camera module based on the voice recognition result, and recognizes the acquired image to obtain the image recognition result of the image.

For example, the camera module of the intelligent electrical appliance acquires images, such as human face images, human body images and the like, located in the image acquisition area of the camera module, and then the camera module of the intelligent electrical appliance identifies the acquired images to obtain the image identification result of the images

To facilitate further understanding of the image recognition scheme of the present application, please refer to fig. 2b, which further exemplifies a structure of the image recognition system of the present application, as shown in the figure, in an embodiment, the system includes a smart appliance, an applet, an account binding login page, and a gateway, where an account service is responsible for maintaining a binding relationship between a device and a social main account, and maintaining device-associated child information. For the identification of the user, the first camera on the equipment is relied on. Specifically, parents enter identity information of a user, namely a child A, a child B and a child C, such as a child name, a school where the child A is located, a class of the school where the child B is located and the like, into an applet in advance, and the child B completes an after-class operation by using an intelligent electric appliance, wherein two schemes can be adopted in the application;

the first scheme comprises the following steps: when the child B writes a post-school assignment, the first camera of the intelligent electric appliance collects images in real time, for example, the image corresponding to a post-school assignment cover and the image corresponding to the content written by the child B are collected, after the child B writes the post-school assignment, the intelligent electric appliance displays whether to submit the assignment on a display screen of the intelligent electric appliance, meanwhile, the voice recognition module collects voice information of the child B, when the collected voice information indicates that the intelligent electric appliance submits the post-school assignment, the intelligent electric appliance sets the child B as a default user according to the image collected by the first camera, and uploads the image corresponding to the content written by the child B to a corresponding cloud end, in addition, when the second camera can recognize that the sitting posture of the child B is not matched with the sitting posture pre-stored by the intelligent electric appliance, the display module can display prompting information of 'requiring adjustment', so that the child B can adjust the sitting posture; in some embodiments, the second camera may further recognize a gesture operation triggered by the child B with respect to the second camera to control the smart appliance, for example, brightness of the lighting module may be increased, or work of the smart appliance may be turned off, and the like.

Scheme II: after the child B completes writing the post-lesson homework, the child B opens the first camera of the intelligent electrical appliance, and shoots the written post-lesson homework by using the first camera, specifically, an Optical Character Recognition (OCR) is performed on an image submitted at a regular time of 100ms, and the device terminal stops shooting Recognition operation after recognizing the name. The children identity recognition carries out OCR recognition on a scanning area, recognized characters are matched with names of pre-added children information in real time, when the matching is successful, the informing device stops recognition automatically, account service switching equipment is called to enable the children to be acquiescent to be recognized, after the written post-class operation is shot, the intelligent electric appliance displays 'whether to submit the operation' on a display screen of the intelligent electric appliance, meanwhile, the voice recognition module collects voice information of the children B, and when the collected voice information indicates that the intelligent electric appliance submits the post-class operation, an image corresponding to the content written by the children B is uploaded to a corresponding cloud end.

The application provides an image recognition system, when the user carries out the image recognition task through intelligent electrical apparatus, need not carry out loaded down with trivial details operation in terminal equipment, speech recognition module can discern user's speech information, the camera module is according to the speech recognition result that the speech recognition module generated, acquire the image that is located the image acquisition region of camera module, and discern the image of gathering, thereby realize the image recognition to the target image, consequently, the intelligent degree of intelligent electrical apparatus has been improved, and then improve the image recognition efficiency of intelligent electrical apparatus.

In addition, the present application also provides an electronic device, as shown in fig. 3, which shows a schematic structural diagram of the electronic device related to the present application, specifically:

the electronic device may include components such as aprocessor 31 of one or more processing cores,memory 32 of one or more computer-readable storage media, apower supply 33, and aninput unit 34. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 3 does not constitute a limitation of the electronic device and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. Wherein:

theprocessor 31 is a control center of the electronic device, connects various parts of the whole electronic device by various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in thememory 32 and calling data stored in thememory 32, thereby performing overall monitoring of the electronic device. Alternatively,processor 31 may include one or more processing cores; preferably, theprocessor 31 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into theprocessor 31.

Thememory 32 may be used to store software programs and modules, and theprocessor 31 executes various functional applications and data processing by operating the software programs and modules stored in thememory 32. Thememory 32 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. Further, thememory 32 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, thememory 32 may also include a memory controller to provide theprocessor 31 access to thememory 32.

The electronic device further comprises apower supply 33 for supplying power to the various components, and preferably, thepower supply 33 is logically connected to theprocessor 31 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are realized through the power management system. Thepower supply 33 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

The electronic device may also include aninput unit 34, theinput unit 34 being operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the electronic device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, theprocessor 31 in the electronic device loads the executable file corresponding to the process of one or more application programs into thememory 32 according to the following instructions, and theprocessor 31 runs the application programs stored in thememory 32, so as to implement various functions as follows:

when the operation triggered by the lighting module is detected, the voice recognition module collects voice information, the voice recognition module recognizes the collected voice information, a voice recognition result of the voice information is generated, the voice recognition result is sent to the camera module, the camera module acquires an image in an image collection area of the camera module based on the voice recognition result, the collected image is recognized, and an image recognition result of the image is obtained.

The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.

It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.

To this end, the present application provides a storage medium having stored therein a plurality of instructions that can be loaded by a processor to perform the steps of any of the image recognition methods provided herein. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random Access Memory (RAM), magnetic or optical disks, and the like.

Since the instructions stored in the storage medium can execute the steps in any image recognition method provided by the present application, the beneficial effects that any image recognition method provided by the present application can achieve can be achieved, for details, see the foregoing embodiments, and are not described herein again.

According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided in the various alternative implementations described above.

The intelligent electrical appliance, the image recognition method, the electronic device and the storage medium provided by the application are described in detail, specific examples are applied in the description to explain the principle and the implementation of the invention, and the description of the above embodiments is only used to help understand the method and the core idea of the invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An intelligent appliance, comprising: the lighting device comprises a base, a connecting mechanism and a lighting module, wherein one end of the connecting mechanism is arranged on the base, and the other end of the connecting mechanism is connected with the lighting module; the intelligent electric appliance belongs to an image recognition system, the image recognition system further comprises an applet, identity information of a plurality of users is input into the applet, and the identity information comprises names of the plurality of users;

the intelligent electric appliance further comprises a voice recognition module and a camera module, the voice recognition module is arranged on the connecting mechanism and/or the lighting module, the camera module is arranged on the connecting mechanism and/or the lighting module, the voice recognition module is used for collecting voice information, recognizing the collected voice information, generating a voice recognition result of the voice information and sending the voice recognition result to the camera module, and the camera module is used for acquiring an image in an image collection area of the camera module according to the voice recognition result generated by the voice recognition module and recognizing the collected image;

the connecting mechanism comprises a connecting part and a supporting part, the supporting part is arranged on the base, the connecting part is provided with a first end and a second end which are oppositely arranged, the first end is arranged on the supporting part, the second end is connected with the lighting module,

the voice recognition module comprises a first voice recognition unit, a second voice recognition unit and a third voice recognition unit, the first voice recognition unit is arranged on one side surface of the connecting part facing the base, the second voice recognition unit is arranged on one side surface of the supporting part facing the lighting module, and the third voice recognition unit is arranged on one side of the base facing the lighting module;

a display module is further arranged on one side surface of the supporting part facing the lighting module and used for displaying prompt information corresponding to an image recognition result according to the image recognition result generated by the camera module;

the camera module comprises a first camera and a second camera, the first camera is arranged on one side surface of the connecting part facing the base, and the second camera is arranged on one side surface of the supporting part facing the illumination module;

the first camera is specifically used for acquiring an image in an image acquisition area of the first camera and identifying a text in the acquired image when a voice recognition result generated by the first voice recognition unit and/or the third voice recognition unit indicates that the image recognition task type is text image recognition;

the image recognition system is used for recognizing the image collected by the first camera, and after recognizing the name, carrying out name matching on the recognized name and the identity information of a plurality of users which is input in advance; when the matching is successful, switching the default user of the intelligent electric appliance to be the identified user; after the first camera finishes shooting the operation, controlling the display module to display prompt information for prompting whether the operation is submitted, and uploading an image of the operation to a cloud end when the voice information acquired by the voice recognition module indicates the intelligent electric appliance to submit the operation;

the second camera is specifically configured to obtain an image located in an image acquisition area of the second camera according to a voice recognition result generated by the second voice recognition unit and/or the third voice recognition unit, and recognize a human body in the acquired image, and includes:

when the voice recognition result generated by the second voice recognition unit and/or the third voice recognition unit indicates that the image recognition task type is gesture image recognition, the second camera is used for acquiring an image set of gesture operation triggered by a user aiming at the second camera to obtain a gesture image set, and according to the gesture image set, gesture operation triggered by the user aiming at the second camera is recognized;

when the voice recognition result indicates that the image recognition task type is human body posture image recognition, the second camera is used for acquiring the current human body posture image of the user, and the human body posture type of the current human body posture image of the user is determined based on a preset human body posture library;

the third voice recognition unit is used as an auxiliary voice recognition unit and is used for assisting in completing voice recognition when the noise of the voice information acquired by the first voice recognition unit or the second voice recognition unit is large.

2. The intelligent electrical appliance according to claim 1, wherein the display module is specifically configured to display a prompt message corresponding to the text recognition result of the first camera.

3. The intelligent electrical appliance according to claim 1, wherein the display module is specifically configured to display prompt information corresponding to the human body gesture recognition result of the second camera.

4. The smart appliance of claim 1, wherein the camera module comprises a third camera, the voice recognition module comprises a fourth voice recognition unit, the lighting module comprises a lighting unit and a housing;

the shell is connected with the second end of the connecting portion, the shell is provided with an accommodating space, the lighting unit is arranged in the accommodating space, the third camera and the fourth voice recognition unit are arranged on the shell, and the third camera faces the base.

5. An image recognition method is applied to an intelligent electric appliance, and is characterized in that the intelligent electric appliance comprises: the intelligent electrical appliance image recognition system comprises a base, a connecting mechanism, a lighting module, a voice recognition module, a camera module and a display module, wherein the connecting mechanism comprises a connecting part and a supporting part, the supporting part is arranged on the base, the connecting part is provided with a first end and a second end which are oppositely arranged, the first end is arranged on the supporting part, the second end is connected with the lighting module, the voice recognition module comprises a first voice recognition unit, a second voice recognition unit and a third voice recognition unit, the first voice recognition unit is arranged on one side face, facing the base, of the connecting part, the second voice recognition unit is arranged on one side face, facing the lighting module, of the supporting part, the third voice recognition unit is arranged on one side face, facing the lighting module, of the base, the camera module comprises a first camera and a second camera, the first camera is arranged on one side face, facing the base, of the connecting part, the second camera is arranged on one side face, facing the lighting module, of the supporting part, the intelligent electrical appliance image recognition system also comprises a small program, a plurality of identity information of users is recorded in the small program, and the identity information comprises a plurality of names; the method comprises the following steps:

the camera module acquires an image in an image acquisition area of the camera module based on the voice recognition result, and recognizes the acquired image to obtain an image recognition result of the image;

the display module displays prompt information corresponding to the image recognition result;

the method for acquiring the image in the image acquisition area of the camera module based on the voice recognition result by the camera module and recognizing the acquired image to obtain the image recognition result of the image comprises the following steps:

when the voice recognition result generated by the first voice recognition unit and/or the third voice recognition unit indicates that the image recognition task type is text image recognition, the first camera acquires an image positioned in an image acquisition area of the first camera and recognizes a text in the acquired image; the image recognition system is used for recognizing the image collected by the first camera, and after recognizing the name, performing name matching on the recognized name and the identity information of a plurality of users which is input in advance; when the matching is successful, switching the default user of the intelligent electric appliance to be the identified user; after the first camera finishes shooting the operation, controlling the display module to display prompt information for prompting whether the operation is submitted, and uploading an image of the operation to a cloud end when the voice information acquired by the voice recognition module indicates the intelligent electric appliance to submit the operation;

when the voice recognition result generated by the second voice recognition unit and/or the third voice recognition unit indicates that the image recognition task type is gesture image recognition, the second camera acquires an image set of gesture operation triggered by a user aiming at the second camera to obtain a gesture image set, and according to the gesture image set, gesture operation triggered by the user aiming at the second camera is recognized;

when the voice recognition result indicates that the image recognition task type is human body posture image recognition, the second camera acquires the current human body posture image of the user, and the human body posture type of the current human body posture image of the user is determined based on a preset human body posture library;

6. The method of claim 5, wherein the first camera identifies text in the captured image, comprising:

when the voice recognition result indicates that the image recognition task type is text image recognition, the first camera acquires an image in an image acquisition area of the first camera to obtain a text image to be recognized, wherein the text image to be recognized comprises a text to be recognized, and the text to be recognized comprises a plurality of text elements;

identifying the corresponding semantics of each text element in the text image to be identified;

and superposing the semantic recognition results of the text elements to obtain the semantic recognition result of the text to be recognized.

7. The method of claim 5, wherein the second camera recognizes a gesture operation triggered by a user with respect to the second camera according to the gesture image set, and the gesture operation comprises:

the second camera selects a reference gesture image matched with each gesture image in the gesture image set from a preset gesture library;

and the second camera identifies gesture operation triggered by the user aiming at the second camera based on the reference gesture image matched with each gesture image.

8. The method of claim 5, wherein the second camera identifies a human in the captured image, comprising:

when the voice recognition result indicates that the image recognition task type is human body posture image recognition, the second camera acquires a current human body posture image of the user;

the second camera selects a reference human body posture image matched with the human body posture image from a preset human body posture library;

and determining the human body posture type corresponding to the reference human body posture image as the human body posture type of the current human body posture image of the user.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the image recognition method according to any of claims 5-8 are implemented when the program is executed by the processor.