Disclosure of Invention
The invention provides intelligent equipment and an image processing method, which are used for improving the efficiency of searching images for a user.
In a first aspect, the present invention provides an intelligent device, comprising: the memory is used for storing the images and the preset labels of the images; the receiver is used for receiving a search instruction of a user; the processor is configured to extract n preset labels in the search instruction, wherein n is a positive integer; performing layer-by-layer label matching with the images in the memory according to the importance degrees of the n preset labels to determine k target images; wherein k is a positive integer; the layer-by-layer label matching comprises the following steps: aiming at the ith preset label in the search instruction, matching the ith preset label with the ith-1 matching result of the ith-1 preset label in the search instruction to determine the ith matching result of the ith preset label in the search instruction; if the number k of the target images in the ith matching result is smaller than a first threshold value, stopping matching; otherwise, continuously matching with the (i + 1) th preset tag in the search instruction according to the (i) th matching result to determine the (i + 1) th matching result; wherein the importance degree of the ith preset label is greater than the importance degree of the (i + 1) th preset label; i is a positive integer and less than n. And the display is used for displaying the information of the k target images.
Illustratively, the memory of the smart device stores preset tags of images corresponding to the images. The receiver of the intelligent device receives a search instruction input by a user, and further, a processor of the intelligent device processes the search instruction to extract n preset tags of the search instruction, wherein the preset tags can be sorted according to the importance degree. Further, the intelligent device sequentially matches preset labels of the image according to the importance degree of the n preset labels, wherein the image is firstly searched according to a first important preset label (a first preset label) in a search instruction, k target images including the first preset label, namely a first matching result, when k is larger than or equal to a first threshold value, a second matching result including a second preset label is continuously searched in the first matching result, when the number of the corresponding target images is still larger than or equal to the first threshold value, a third matching result including a third preset label is continuously searched in the second matching result, until the number k of the target images in the jth matching result is smaller than the first threshold value, the matching is stopped, and information of the target images corresponding to the jth matching result is output on a display screen. By sequentially matching the image labels according to the importance degree of the preset labels, the number of the target images of the determined matching result is ensured to be proper, the obtained target images are the images which the user really wants to search, and the problems that in the prior art, the number of the target images obtained after matching according to the keywords is too large or too small are avoided, and the user experience is better.
In one possible design, the preset tag includes a character identity tag. The processor is configured to perform face feature recognition on the image in the memory and determine a recognition result. Further, if the similarity between the recognition result and the user face features in the person database is greater than a second threshold, determining the person identity tag corresponding to the user face features as the person identity tag of the image.
Illustratively, the intelligent device receives an image uploaded by a user and stores the image in the memory, and further a processor of the intelligent device processes and identifies the image and identifies a corresponding preset label. And carrying out face feature recognition on the image to determine a recognition result. And if the similarity between the recognition result and the pre-stored face features of the user is greater than a second threshold value, directly determining that the task identity label of the image is the corresponding user. Through the scheme, the person identity tag corresponding to the image is determined through face recognition, follow-up intelligent equipment is helped to be matched according to the preset tag, and the search efficiency is helped to be improved.
In one possible design, the processor is further configured to determine an age difference value between the person age tag of the image and the person age tag corresponding to the facial feature of the user when the similarity is smaller than the second threshold and larger than a third threshold; if the identification error corresponding to the age difference is not smaller than the similarity error, determining the person identity label corresponding to the face feature of the user as the person identity label of the image; wherein the error of the similarity is a difference between the first threshold and the similarity. Through carrying out face identification to the image and handling, confirm the personage identity label of people's face, help follow-up smart machine matches according to predetermineeing the label, help realizes improving the efficiency of seeking.
In one possible design, the n preset labels include a preset label corresponding to the central participle and a preset label corresponding to the auxiliary participle; the importance degree of the preset label corresponding to the central word segmentation is larger than that of the preset label corresponding to the auxiliary word segmentation, and a foundation is laid for the follow-up intelligent equipment to perform layer-by-layer matching according to the preset label through the importance degree of the preset label corresponding to the central word segmentation and the importance degree of the preset label corresponding to the auxiliary word segmentation, so that the follow-up intelligent equipment is prepared, the searching efficiency is improved, and the image really wanted by the user is searched.
In one possible design, the search instruction is a voice message; the smart device further includes: the voice collector is used for collecting voice information; the processor is further configured to identify corresponding text information and intonation information according to the voice information; and determining the center participles and the preset labels corresponding to the center participles, the auxiliary participles and the preset labels corresponding to the auxiliary participles according to the text information and the tone information. The technical scheme prepares for the subsequent intelligent equipment to perform layer-by-layer matching according to the preset label.
In one possible design, the processor is specifically configured to, when the voice information is determined to be a statement sentence according to the intonation information, take a principal-predicate object in the text information as a central participle, and extract a preset tag corresponding to the central participle. And taking the fixed shape supplement in the text information as an auxiliary segmentation word, and extracting a preset label corresponding to the auxiliary segmentation word. The help determines the preset label of the voice information, and the matching efficiency is improved.
In one possible design, the receiver is further configured to receive an image uploaded by a user; the processor is also configured to perform recognition processing on the uploaded images and determine a preset label corresponding to each image; the preset label comprises at least one of a person identity label, a person age label, a definition label, a scene label, an address label and a time information label. And carrying out identification processing through the uploaded image, and preparing for the subsequent intelligent equipment to carry out layer-by-layer matching according to a preset label.
In one possible design, the processor is further configured to perform deduplication processing on the target image when the number of target images in the nth matching result is greater than or equal to a first threshold; and/or removing the image with the definition not meeting the preset condition in the target image. The number of the obtained target images meets the requirements of users, a good display effect is achieved, and user experience is improved.
In a possible design, the first threshold is determined according to the size of the display screen and the display size of the information of the target image, so that a better display effect is achieved, and the user experience is improved.
The invention has the following beneficial effects: the method and the device have the advantages that the images in the intelligent device can be quickly and accurately searched, the man-machine interaction efficiency is improved, the intelligent device can quickly identify the intention of the user, and accordingly the beneficial effect that the target images with high quality and proper quantity can be accurately searched is achieved.
In a second aspect, an embodiment of the present invention further provides an image processing method, including: receiving a search instruction of a user, and further extracting n preset labels in the search instruction; and the intelligent equipment is sequentially subjected to layer-by-layer label matching with the image according to the importance degrees of the n preset labels, wherein k is a positive integer. Wherein, layer upon layer tag matching includes: aiming at the ith preset label in the search instruction, matching the ith preset label with the ith-1 matching result of the ith-1 preset label in the search instruction, and determining the ith matching result matched with the ith preset label in the search instruction; if the number k of the target images in the ith matching result is smaller than a first threshold value, stopping matching; otherwise, continuously matching with the (i + 1) th preset tag in the search instruction according to the (i) th matching result to determine the (i + 1) th matching result; wherein the importance degree of the ith preset label is greater than the importance degree of the (i + 1) th preset label; i is a positive integer and is less than or equal to n. And finally, the intelligent equipment displays the information of the k target images.
In one possible design, the intelligent device performs face feature recognition on the image to determine a recognition result.
And if the similarity between the recognition result and the user face characteristics in the person database is greater than a second threshold value, determining the person identity label corresponding to the user face characteristics as the person identity label of the image.
In a possible design, when the intelligent device determines that the similarity is smaller than the second threshold and larger than the third threshold, the age difference value between the person age tag of the image and the person age tag corresponding to the face feature of the user is determined. If the identification error corresponding to the age difference is not smaller than the similarity error, determining the person identity label corresponding to the face feature of the user as the person identity label of the image; wherein the error of the similarity is a difference between the first threshold and the similarity.
In one possible design, the n preset labels include a preset label corresponding to the central participle and a preset label corresponding to the auxiliary participle, and the importance degree of the preset label corresponding to the central participle is greater than that of the preset label corresponding to the auxiliary participle.
In one possible design, when the search instruction is voice information, the intelligent device collects the voice information, then recognizes corresponding text information and intonation information according to the voice information, and further determines a center participle and a preset label corresponding to the center participle, an auxiliary participle and a preset label corresponding to the auxiliary participle according to the text information and the intonation information.
In a possible design, the intelligent device determines the center participle and the preset label corresponding to the center participle, the auxiliary participle and the preset label corresponding to the auxiliary participle according to the text information and the intonation information, and the method comprises the following steps:
when the voice information is determined to be a statement sentence according to the tone information, taking a main predicate object in the text information as a central participle, and extracting a preset label corresponding to the central participle; and taking the fixed shape supplement in the text information as an auxiliary segmentation word, and extracting a preset label corresponding to the auxiliary segmentation word.
In one possible design, the intelligent device receives images uploaded by a user, identifies the uploaded images, and determines a preset label corresponding to each image. The preset label comprises at least one of a person identity label, a person age label, a definition label, a scene label, an address label and a time information label.
In one possible design, when the number of target images in the nth matching result is greater than or equal to a first threshold, performing deduplication processing on the target images; and/or removing the image with the definition not meeting the preset condition in the target image.
In one possible design, the first threshold is determined based on a size of the display screen and a display size of the information of the target image.
An embodiment of the present invention provides a computing device, including a memory for storing a computer program; and a processor for calling the computer program stored in the memory and executing the image processing method according to the obtained program.
An embodiment of the present invention provides a computer-readable non-volatile storage medium including a computer-readable program, which, when read and executed by a computer, causes the computer to execute any one of the image processing methods described above.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention is further described with reference to the accompanying drawings and examples. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus their repetitive description will be omitted. The words expressing the position and direction described in the present invention are illustrated in the accompanying drawings, but may be changed as required and still be within the scope of the present invention. The drawings of the present invention are for illustrative purposes only and do not represent true scale.
Fig. 1 is a schematic view of an application scenario of an intelligent terminal according to an embodiment of the present invention, which includes an intelligent device 20 and a client 10. Alternatively, the smart device 20 may be a digital television, a web television, an Internet Protocol Television (IPTV); or a smart phone or an electronic album; the client 10 may be a smartphone, a wearable device, or the like.
For example, the client 10 may install a software application with the smart device 20, implement connection communication through a network communication protocol, and implement the purpose of one-to-one control operation and data communication. Such as: the client 10 and the smart device 20 may establish a control instruction protocol, implement functions such as key pressing by operating various function keys or virtual controls of a user interface provided on the client 10, and receive a search instruction in a voice form or a search instruction in a text form from the user through the client 10 and send a corresponding search instruction to the smart device 20 through connection. The audio and video content displayed on the client 10 can also be transmitted to thedisplay 275 of the smart device 20, so as to realize the synchronous display function.
As shown in the hardware configuration block diagram of the smart device 20 in fig. 2, the smart device 20 may include acommunicator 220, adetector 230, acontroller 250 including aprocessor 254, amemory 260, and adisplay 275; both thecommunicator 220 and thedetector 230 may correspond to the receiver in the present embodiment, that is, only have to receive the search instruction input by the user. In other words, the user may input the search instruction through the client 10 and then transmit the search instruction to the smart device 20 through thecommunicator 220, or the user may directly issue the search instruction in a voice or text form to the smart device 20 through thedetector 230.
In some smart devices 20, such as televisions, the smart device 20 further includes atuner demodulator 210, adetector 230, anexternal device interface 240, auser interface 265, avideo processor 270, anaudio processor 280, anaudio output interface 285, apower supply 290, and the like.
Specifically, thememory 260 is used for storing the image and the preset label thereof. A receiver for receiving a search instruction input by a user, and thecontroller 250 is configured to perform tag matching in the image according to the search instruction, and search for an appropriate number of images desired by the user, wherein the receiver may be thecommunicator 220 or thedetector 230.
The specific functions of the respective components and the functions in the present embodiment are described below, respectively.
Thecommunicator 220 is a component for communicating with an external device or an external server according to various communication protocol types. For example, the smart device 20 may transmit the content data to an external device, such as the client 10, connected via thecommunicator 220; alternatively, the content data is browsed and downloaded from an external device connected via thecommunicator 220. Thecommunicator 220 may include a network communication protocol module or a near field communication protocol module, such as aWIFI module 221, a bluetooth communication protocol module 222, and a wired ethernetcommunication protocol module 223, so that thecommunicator 220 may receive a control signal of the client 10 according to the control of thecontroller 250, and implement the control signal as a WIFI signal, a bluetooth signal, a radio frequency signal, and the like.
Thedetector 230 is a component of the smart device 20 for collecting signals of an external environment or interaction with the outside. Thedetector 230 may include asound collector 231, such as a microphone, which may be used to receive the sound of the user, such as a control instruction of the user in the form of a voice message to the smart device 20; alternatively, ambient sounds may be collected for identifying the type of ambient scene, enabling the smart device 20 to adapt to ambient noise.
In some other exemplary embodiments, thedetector 230, which may further include animage collector 232, such as a camera, a video camera, etc., may be configured to collect external environment scenes to adaptively change the display parameters of the smart device 20; and functions for collecting attributes of the user or interacting gestures with the user to achieve interaction between the smart device 20 and the user.
In some other exemplary embodiments, thedetector 230 may further include a light receiver for collecting the ambient light intensity to adapt to the display parameter variation of the smart device 20.
In some other exemplary embodiments, thedetector 230 may further include a temperature sensor, such as by sensing an ambient temperature, and the smart device 20 may adaptively adjust a display color temperature of the image. For example, when the temperature is higher, the smart device 20 may be adjusted to display a cool color tone; when the temperature is in a low environment, the intelligent device 20 can be adjusted to display the image with a warm color temperature.
Thecontroller 250 controls the operation of the smart device 20 and responds to the user's operations by running various software control programs (e.g., an operating system and various application programs) stored on thememory 260.
Controller 250 includes Random Access Memory (RAM)251, Read Only Memory (ROM)252,graphics processor 253,processor 254,communication interface 255, andcommunication bus 256. The RAM251, the ROM252, thegraphic processor 253, theCPU processor 254, and thecommunication interface 255 are connected by acommunication bus 256.
The ROM252 stores various system boot instructions. When the power-on signal is received, the smart device 20 starts to boot, and theprocessor 254 executes the system boot instruction in the ROM252 and copies the operating system stored in thememory 260 to the RAM251 to start running the boot operating system. After the start of the operating system is completed, theprocessor 254 copies the various applications in thememory 260 to the RAM251 and then starts running the various applications.
And agraphic processor 253 for generating various graphic objects such as icons, operation menus, and user input instruction display graphics, etc. Thegraphic processor 253 may include an operator for performing an operation by receiving various interactive instructions input by a user, and further displaying various objects according to display attributes; and a renderer for generating various objects based on the operator and displaying the rendered result on thedisplay 275.
Aprocessor 254 for executing operating system and application program instructions stored inmemory 260. And according to the received user input instruction, processing of various application programs, data and contents is executed so as to finally display and play various audio-video contents.
In some demonstrative embodiments,processor 254 may include a plurality of processors. The plurality of processors may include one main processor and a plurality of or one sub-processor. A main processor for performing some initialization operations of the smart device 20 in the pre-load mode of the smart device 20 and/or operations of displaying a screen in the normal mode. A plurality of or a sub-processor for performing an operation in a state of the smart device 20 in a standby mode or the like.
Thecommunication interface 255 may include a first interface to an nth interface. These interfaces may be network interfaces that are connected to external devices via a network.
Thecontroller 250 may control the overall operation of the smart device 20. For example: in response to receiving a user input command for selecting a Graphical User Interface (GUI) object displayed on thedisplay 275, thecontroller 250 may perform an operation related to the object selected by the user input command.
Where the object may be any one of the selectable objects, such as a hyperlink or an icon. The operation related to the selected object is, for example, an operation of displaying a link to a hyperlink page, document, image, or the like, or an operation of executing a program corresponding to the object. The user input command for selecting the GUI object may be a command input through various input means (e.g., a mouse, a keyboard, a touch panel, etc.) connected to the smart device 20 or a voice command corresponding to a voice spoken by the user.
Specifically, thecontroller 250 is configured to extract n preset tags in the search instruction, sequentially perform layer-by-layer tag matching with the images in thememory 260 according to the importance degrees of the n preset tags of the search instruction, and determine k target images; wherein, layer upon layer tag matching includes: aiming at the ith preset label, matching the ith preset label with the ith-1 matching result of the ith-1 preset label to determine the ith matching result matched with the ith preset label; if the number k of the target images in the ith matching result is smaller than a first threshold value, stopping matching; otherwise, continuously matching with the (i + 1) th preset label according to the (i) th matching result to determine the (i + 1) th matching result; wherein, the importance degree of the ith preset label is greater than the importance degree of the (i + 1) th preset label. Amemory 260 for storing various types of data, software programs, or applications that drive and control the operation of the smart device 20. Thememory 260 may include volatile and/or nonvolatile memory. And the term "memory" includes thememory 260, the RAM251 and the ROM252 of thecontroller 250, or a memory card in the smart device 20.
In some embodiments, thememory 260 is specifically used for storing an operating program for driving thecontroller 250 in the smart device 20; storing various applications built into the smart device 20 and downloaded by the user from an external device; data such as visual effect images for configuring various GUIs provided by thedisplay 275, various objects related to the GUIs, and selectors for selecting GUI objects are stored.
In some embodiments,memory 260 is specifically configured to store drivers fortuner demodulator 210,communicator 220,detector 230,external device interface 240,video processor 270,display 275,audio processor 280, etc., and related data, such as external data (e.g., audio-visual data) received from the external device interface or user data (e.g., key information, voice information, touch information, etc.) received by theuser interface 265, the receiver.
In some embodiments,memory 260 specifically stores software and/or programs representing an Operating System (OS), which may include, for example: a kernel, middleware, an Application Programming Interface (API), and/or an application program. Illustratively, the kernel may control or manage system resources, as well as functions implemented by other programs (e.g., the middleware, APIs, or applications); at the same time, the kernel may provide an interface to allow middleware, APIs, or applications to access the controller to enable control or management of system resources.
Specifically, thememory 260 may also be used to store the image and the preset label of the image in the embodiment. The preset label of the image is a mark for identifying the image information, which is determined according to an algorithm and the image information, and the specific determination process of the preset label is described later.
Adisplay 275 for receiving the image signal from thevideo processor 270 and displaying the video content, the image and the menu manipulation interface. Thedisplay 275 may be a liquid crystal display, an organic light emitting display, a projection device. The specific display device type, size, resolution, etc. are not limited. Thedisplay 275 may include a display component for presenting a picture and a driving component that drives the display of an image. Alternatively, a projection device and projection screen may be included, provideddisplay 275 is a projection display.
Specifically, thedisplay 275 is used to display information of the target image, that is, relevant information of the image conforming to the search instruction determined by layer-by-layer matching, such as a thumbnail image and some basic information of the image, such as time, size, and the like. Thetuner demodulator 210 receives the broadcast television signal in a wired or wireless manner, may perform modulation and demodulation processing such as amplification, mixing, and resonance, and is configured to demodulate, from a plurality of wireless or wired broadcast television signals, an audio/video signal carried in a frequency of a television channel selected by a user, and additional information (e.g., EPG data).
Thetuner demodulator 210 is responsive to the user selected frequency of the television channel and the television signal carried by the frequency, as selected by the user and controlled by thecontroller 250.
Thetuner demodulator 210 can receive a television signal in various ways according to the broadcasting system of the television signal, such as: terrestrial broadcasting, cable broadcasting, satellite broadcasting, internet broadcasting, or the like; and according to different modulation types, a digital modulation mode or an analog modulation mode can be adopted; and can demodulate the analog signal and the digital signal according to the different kinds of the received television signals.
In other exemplary embodiments, thetuning demodulator 210 may also be in an external device, such as an external set-top box. In this way, the set-top box outputs television signals after modulation and demodulation, and the television signals are input into the smart device 20 through theexternal device interface 240.
Theexternal device interface 240 is a component for providing thecontroller 250 to control data transmission between the smart device 20 and an external device. Theexternal device interface 240 may be connected to an external apparatus such as a set-top box, a game device, a notebook computer, etc. in a wired/wireless manner, and may receive data such as a video signal (e.g., moving image), an audio signal (e.g., music), additional information (e.g., EPG), etc. of the external apparatus.
Theexternal device interface 240 may include: a High Definition Multimedia Interface (HDMI) terminal 241, a Composite Video Blanking Sync (CVBS)terminal 242, an analog ordigital Component terminal 243, a Universal Serial Bus (USB)terminal 244, a Component terminal (not shown), a red, green, blue (RGB) terminal (not shown), and the like. Auser interface 265 may be used to receive various user interactions. Specifically, it is used to transmit an input signal of a user to thecontroller 250 or transmit an output signal from thecontroller 250 to the user. For example, the remote controller may transmit an input signal, such as a power switch signal, a channel selection signal, a volume adjustment signal, etc., input by the user to theuser interface 265, and then the input signal is transferred to thecontroller 250 through theuser interface 265; alternatively, the remote controller may receive an output signal such as audio, video, or data output from theuser interface 265 via thecontroller 250, and display the received output signal or output the received output signal in audio or vibration form.
In some embodiments, a user may enter user commands on a Graphical User Interface (GUI) displayed ondisplay 275. Specifically, theuser interface 265 may receive user input commands for a user to select different objects or items through the location of the remote control in the GUI. Among these, "user interfaces" are media interfaces for interaction and information exchange between an application or operating system and a user, which enable the conversion between an internal form of information and a form acceptable to the user. A common presentation form of a user interface is a Graphical User Interface (GUI), which refers to a user interface related to computer operations that is displayed in a graphical manner. It may be an interface element such as an icon, window, control, etc. displayed in the display of the electronic device, where the control may include a visual interface element such as an icon, control, menu, tab, text box, dialog box, status bar, channel bar, applet Widget, etc.
Theaudio processor 280 is configured to receive an external audio signal, decompress and decode the received audio signal according to a standard codec protocol of the input signal, and perform audio data processing such as noise reduction, digital-to-analog conversion, and amplification processing to obtain an audio signal that can be played by thespeaker 286.
Illustratively,audio processor 280 may support various audio formats. Such as MPEG-2, MPEG-4, Advanced Audio Coding (AAC), high efficiency AAC (HE-AAC), and the like.
Theaudio output interface 285 is used for receiving an audio signal output by theaudio processor 280 under the control of thecontroller 250, and theaudio output interface 285 may include aspeaker 286 or an externalsound output terminal 287, such as an earphone output terminal, for outputting to a generating device of an external device.
In other exemplary embodiments,video processor 270 may comprise one or more chips.Audio processor 280 may also comprise one or more chips.
And, in other exemplary embodiments, thevideo processor 270 and theaudio processor 280 may be separate chips or may be integrated with thecontroller 250 in one or more chips.
And apower supply 290 for providing power supply support for the smart device 20 by the power input from the external power source under the control of thecontroller 250. Thepower supply 290 may be a built-in power supply circuit installed inside the smart device 20, or may be a power supply installed outside the smart device 20.
In conjunction with the above scenario, fig. 3 shows a flow chart of a method for image processing, which includes:
instep 301, the smart device 20 receives a search instruction from a user.
Illustratively, the user inputs a search instruction "search for photos of basketball in Harry at the gym" into the smart device 20. Wherein a user can input a search instruction to the smart device through the client 10, the client 10 can be connected with thecommunicator 220 of the smart device 20, and the two can realize mutual message sending and receiving. The search instruction in the form of a voice uttered by the user may also be detected and collected by thesound detector 231 of thedetector 230.
Instep 302, the smart device 20 extracts n preset tags in the search instruction.
Illustratively, theprocessor 254 of the smart device 20 performs processing identification on the search instruction, and extracting the preset tag of the search instruction of the user includes: the first preset label "Harry", the second preset label "basketball playing", and the third preset label "gym". The preset label is extracted by processing the search instruction, so that preparation is made for matching in subsequent steps, the user intention is better understood, and the human-computer interaction efficiency is improved.
Step 303, the intelligent device 20 performs layer-by-layer label matching with the images according to the importance degrees of the n preset labels, and determines k target images, wherein k is a positive integer; wherein, layer upon layer tag matching includes:
aiming at the ith preset label in the search instruction, matching the ith preset label with the ith-1 matching result of the ith-1 preset label in the search instruction, and determining the ith matching result matched with the ith preset label in the search instruction; if the number k of the target images in the ith matching result is smaller than a first threshold value, stopping matching; otherwise, continuously matching with the (i + 1) th preset tag in the search instruction according to the (i) th matching result to determine the (i + 1) th matching result; wherein the importance degree of the ith preset label is greater than the importance degree of the (i + 1) th preset label; i is a positive integer and is less than or equal to n.
Exemplarily, first, theprocessor 254 of the smart device 20 performs tag matching on the most important first preset tag "harry" and the pre-stored image and the preset tag thereof, finds out a first matching result of 85 images including the preset tag "harry", and determines that the number 85 of the images of the first matching result is not less than the first preset threshold 20; theprocessor 254 of the smart device 20 further searches for a second matching result including an image with a second preset label "basketball shooting" in the first matching result, stops matching if the number of images in the second matching result is 12, and 12 is smaller than the first preset threshold 20, outputs the image in the second matching result as a target image, and displays information of the corresponding target image on thedisplay 275 of the smart device 20. If the number of images in the second matching result is 25, obviously 25 is not less than the first preset threshold 20, continuing the matching according to the above method until the number of images in the matching result is less than the first preset threshold. According to the technical scheme, layer-by-layer label matching can be performed on the images according to the importance degree of the preset labels in order, the finally obtained target images are guaranteed to meet the user intention and the quantity requirement of display, and the user experience is improved.
Instep 303, the smart device 20 displays information of the target image.
When theprocessor 254 of the smart device 20 determines that the number of images of the matching result is less than the first threshold, the target image is determined, and then thedisplay 275 displays information of the corresponding target image.
The intelligent device 20 adopts the above technical scheme, firstly processes and identifies the search instruction input by the user, so as to effectively and automatically determine each preset tag and the importance degree thereof in the search instruction, and further, the intelligent device 20 performs layer-by-layer matching identification on the preset tag and the images prestored in thememory 260 according to the importance degree, so as to ensure that the number and quality of the finally identified target images meet the actual requirements of the user, improve the image processing efficiency, and realize the high efficiency of human-computer interaction.
Further, beforestep 301, the receiver of the smart device 20 is also used for receiving the image uploaded by the user; the processor is configured to perform recognition processing on the uploaded images and determine a preset label corresponding to each image; the preset label comprises at least one of a person identity label, a person age label, a definition label, a scene label, an address label and a time information label.
Optionally, the user opens the smart album application on the client 10, selects an image and uploads the image to an image management cloud (which may be a home edge computing server with a photo storage management function or a photo storage management server centralized in a computer room); further, after the intelligent photo album of the intelligent device 20 is applied to the image management cloud to receive the image, the intelligent device 20 may perform algorithm recognition processing on the image in the image management cloud, extract information such as character features, environmental features, location information, shooting time, storage time, number of characters, clothing color, text content, picture background, daytime at night, weather conditions, etc., in the image, extract preset tags of each image and store the extracted information in thememory 260 of the intelligent device 20, or store the extracted information in the picture management cloud, and the intelligent device 20 directly checks the extracted information through the intelligent photo album application. Through the steps, the images uploaded by the user are processed and identified, the preset labels of all the images are extracted, image searching is conveniently carried out according to the preset labels in the follow-up process, the image processing efficiency is improved, and the searched images are more accurate and more accord with the user requirements.
Optionally, the user may connect the photo management cloud to the smart album application of the client 10 or the smart album application of the smart device 20, view, download, delete, and the like photos stored in the cloud, and may view each image in a classified manner according to the extracted tag information, for example, the user may directly view an image corresponding to the time information tag of "2020, 1 month", which is convenient for the user to view the images in a classified manner, and improves user experience.
Further, in a possible embodiment, the intelligent device 20 performs face feature recognition on the image in the memory to determine a recognition result; and if the similarity between the recognition result and the user face characteristics in the person database is greater than a second threshold value, determining the person identity label corresponding to the user face characteristics as the person identity label of the image.
Illustratively, theprocessor 254 of the smart device 20 performs face feature recognition on the image uploaded by the user, and determines a person identity tag corresponding to the face in the image, wherein if the similarity between the face recognized in the image and the face feature pre-stored in the person database by the user "harry" reaches 99% and is greater than the preset second threshold value of 90%, the person identity tag corresponding to the face in the image is directly determined to be "harry".
Further, when the popularity is smaller than a second threshold and larger than a third threshold, determining an age difference value between a person age label of the image and a person age label corresponding to the face feature of the user; if the identification error corresponding to the age difference is not smaller than the similarity error, determining the person identity label corresponding to the face feature of the user as the person identity label of the image; wherein the error of the similarity is a difference between the first threshold and the similarity.
Illustratively, if the similarity of the face features of the face in the image and the user "haoli" in the user database is 89%, less than 90% of the second preset threshold and greater than 85% of the third preset threshold, the face of the image is further subjected to age recognition, and if the age label of the face corresponding to the image is recognized as "18 years", the age label of the user "haoli" in the user database is "28 years". Further, the preset age difference of 10 years corresponds to the allowable similarity error of 12%, and the similarity error is 1% (the difference between the second threshold and the similarity) 1% < 12%, so that the person identity label corresponding to the face in the image is determined to be "harry".
Optionally, if the similarity between the face in the image and the face feature of the user "haoli" in the user database is 50% and is less than the third threshold 85%, theprocessor 254 of the smart device 20 does not recognize that the user is "haoli", and continues to match the face feature in the user database or receive a person identity tag input by the user and mark the person identity tag on the face in the image.
In one possible embodiment, the user may enter a search instruction in the form of a voice instep 301.
Optionally, the smart device 20 may also collect voice information of the user.
In one possible embodiment, the n preset labels of the search instruction include a preset label corresponding to the central participle and a preset label corresponding to the auxiliary participle; the importance degree of the preset label corresponding to the central word segmentation is larger than that of the preset label corresponding to the auxiliary word segmentation.
Illustratively, theprocessor 254 of the smart device 20 may divide the preset tags extracted from the search instruction of the user into a preset tag corresponding to the central participle and a preset tag corresponding to the auxiliary participle, and set the importance degree of the former to be greater than that of the latter. By distinguishing the center word segmentation and the auxiliary word segmentation, the importance degree sequencing of the preset labels is better determined from the searching instruction of the user, the image searching result obtained through matching layer by layer better meets the user requirement, and the user experience and the image searching efficiency are improved.
Further, in a possible embodiment, instep 302, the intelligent device 20 identifies corresponding text information and intonation information according to the voice information of the user, and further determines a center participle and a preset label corresponding to the center participle, and an auxiliary participle and a preset label corresponding to the auxiliary participle according to the text information and the intonation information.
Alternatively, theprocessor 254 of the smart device 20 first performs recognition of voice information to obtain corresponding text information and intonation information, and further determines the content of the text based on the intonation information, such as rising (↗), falling (↙), rising (Λ), falling (v), and flat (→). If the rising tone exists, the word is used as an auxiliary word instead of a central word.
Illustratively, the user utters a voice "find a photo of playing basketball in the gym" to the smart device 20, thesound collector 231 of the smart device 20 collects and processes the voice information, converts the voice information to generate a corresponding text message "find a photo of playing basketball in the gym", and determines the intonation information of the voice information according to the voice information. For example, the smart device 20 recognizes text information and intonation information of "find a photo of haoli playing a basketball in a gym" according to a pre-trained language recognition model, extracts corresponding preset tags, including the preset tag corresponding to the center participle, and the preset tag corresponding to the auxiliary participle, helps to better sort the importance of each preset tag, ensures that the search intention of the user is better met, and improves user experience.
When the intelligent device 20 determines that the voice information is a statement sentence according to the intonation information, taking a principal-predicate object in the text information as a central participle, and extracting a preset label corresponding to the central participle; and taking the fixed shape supplement in the text information as an auxiliary segmentation word, and extracting a preset label corresponding to the auxiliary segmentation word.
Illustratively, theprocessor 254 of the smart device 20 determines "find a photo of haoli playing basketball in the gym" as a statement sentence, and further determines "haoli", "find" and "basketball" as the main predicates, wherein "haoli" is a character identity tag in the preset tags, and "basketball" is a scene tag in the preset tags, and discards "find" and determines "haoli", "basketball" as the preset tags corresponding to the central participle; and similarly, determining a preset label 'gymnasium' corresponding to the auxiliary participle. The importance degree of the preset label corresponding to the central word segmentation is larger than that of the preset label corresponding to the auxiliary word segmentation.
Alternatively, the smart device 20 may preset the importance level of each preset tag, for example, the user may set the importance level to the preset tag through the user interface 265: character identity tag > address tag > scene tag > time information tag > character age tag > definition tag.
Optionally, the importance degree ranking of the preset labels preset by the user and the importance degrees of the center participles and the auxiliary participles corresponding to the search instruction of the user may be combined, the importance degrees of the preset labels corresponding to the center participles are determined according to the importance degrees preset by the user, the importance degrees of the preset labels corresponding to the auxiliary participles are determined, and the importance degree ranking of the preset labels extracted from the search instruction is finally obtained. Illustratively, the degree of importance in the core participle is ranked: "Harley" > "playing basketball"; then the finally determined importance degree of the preset label is ranked as follows: "Harry" > "playing basketball" > "gym", determining that the first preset label is "Harry"; the second preset label is 'playing basketball'; the third preset label is "gym". Therefore, the first preset label, the second preset label, the third preset label and the like are determined, the sequence of the labels of the images in the subsequent steps which are matched layer by layer is determined, the searched image result is more in line with the user requirement, and the user experience is improved.
Optionally, when the form of the search instruction input by the user is a text form, the smart device 20 processes and analyzes the search instruction, extracts each preset tag from the search instruction, sorts the importance degree of each preset tag according to the preset importance degree of the preset tag, determines the first preset tag, the second preset tag, the nth preset tag, and the like, sorts the importance degree of the preset tag extracted from the search instruction, facilitates layer-by-layer tag matching with the image according to the tag in sequence in subsequent steps, ensures that the image of the matching result better conforms to the search intention of the user, and the number of the obtained images better conforms to the use habit of the user, thereby improving the user experience.
In a possible embodiment, step 303 further comprises:
when the number of the target images in the matching result is larger than or equal to a first threshold value, performing deduplication processing on the target images; and/or removing the image with the definition not meeting the preset condition in the target image.
After matching is performed according to the preset tag extracted from the search instruction, the number of images in the last nth matching result is still greater than the first threshold, and then the intelligent device 20 performs similarity identification on each image in the nth matching result, and performs deduplication processing on the image with higher similarity. Optionally, the smart device 20 may perform sharpness recognition on the images, print a corresponding sharpness label on each image, and remove the image with unsatisfactory sharpness when the number of images of the matching result is greater than a first preset threshold. Illustratively, the above-described operations may be performed by theprocessor 254 of the smart device 20.
Optionally, assuming that there are 5 preset tags corresponding to the center participles, first matching the 5 preset tags with the highest importance degree with the tags of the already stored pictures to obtain a first group of pictures, counting the number of the first group of pictures, and if the number is greater than a first threshold (the number of the pictures that can be displayed by thecurrent display 275 set by the user is, for example, 9), continuing matching the preset tags with the second importance degree in the 5 center participles with the tags in the first group of pictures to obtain a second group of pictures until the number of the obtained pictures is less than or equal to the current first threshold.
Exemplarily, after the first preset tag, the second preset tag and the third preset tag in the "find a photo of playing basketball in gymnasium" according to the search instruction of the user are respectively matched, the number of the images in the obtained third matching result is 22, and obviously 22 is not less than the set first preset threshold 20, then the similarity recognition is performed on 22 images of the third matching result, the deduplication processing is performed, the images with the similarity greater than the preset condition are removed, and the remaining images are determined to be the target images and output to thedisplay 275 for display.
Illustratively, if the number of the images after the deduplication is still larger than the first threshold (for example, none of the similarity in the images is high, and the deduplication is not performed), the images are further subjected to a sharpness recognition process to remove blurred images (for example, an image with sharpness less than 80% is determined as blurred).
In one possible embodiment, the first threshold is determined according to the size of the display screen and the display size of the information of the target image. Illustratively, the smart device 20 may set the number of images that thedisplay 275 most fits in, based on the size of thedisplay 275 itself and the display size set by the target image. Or the display quantity and size set by the user according to the requirement of the user. For example, as shown in fig. 4, when the user wants to see the searched image at a glance, the user may select "display large-sized image" by himself or herself, thedisplay 275 may display 8 images at a time, and the first threshold may be set to 8, or a multiple of 8. As shown in fig. 5, when the user selects "display small-sized image", the display may display 20 images at a time, and the first threshold may be set to 20, or a multiple of 20. By setting the first threshold value, the target image can be flexibly displayed, and the user experience is improved.
In the embodiment of the application, the voice text information is analyzed to obtain the preset label according to the voice information or the text information input by the user, and then the preset label is matched with the stored image and the preset label layer by layer, so that the picture can be quickly positioned. When the preset label is obtained by analyzing the voice text information, the importance degree sequence is further given by dividing the center participle and the auxiliary participle, the searching logic is enhanced, and the searching efficiency is improved.
Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium, which includes instructions that, when executed on a computer, cause the computer to perform the above-mentioned image processing method.
Based on the same inventive concept, embodiments of the present invention also provide a computer program product, which, when running on a computer, causes the computer to execute the above-mentioned image processing method.
Based on the same technical concept, the embodiment of the present invention provides a computer, as shown in fig. 6, including at least oneprocessor 601 and amemory 602 connected to the at least one processor, where a specific connection medium between theprocessor 601 and thememory 602 is not limited in the embodiment of the present invention, and theprocessor 601 and thememory 602 are connected through a bus in fig. 6 as an example. The bus may be divided into an address bus, a data bus, a control bus, etc.
In the embodiment of the present invention, thememory 602 stores instructions executable by the at least oneprocessor 601, and the at least oneprocessor 601 may execute the steps included in the foregoing image processing method by executing the instructions stored in thememory 602.
Theprocessor 601 is a control center of the computer, and may be connected to various parts of the computer by using various interfaces and lines, and implement data processing by executing or executing instructions stored in thememory 602 and calling data stored in thememory 602. Optionally, theprocessor 601 may include one or more processing units, and theprocessor 601 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem processor mainly processes an instruction issued by an operation and maintenance worker. It will be appreciated that the modem processor described above may not be integrated into theprocessor 601. In some embodiments, theprocessor 601 and thememory 602 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
Theprocessor 601 may be a general-purpose processor, such as a Central Processing Unit (CPU), a digital signal processor, an Application Specific Integrated Circuit (ASIC), a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, configured to implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the disclosed method in connection with embodiments of the distributed batch processing system-based processing method may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules within the processor.
Thememory 602, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. TheMemory 602 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. Thememory 602 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. Thememory 602 in the embodiments of the present invention may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
The embodiments provided in the present application are only a few examples of the concepts in the present application, and do not limit the scope of the present application. Any other embodiments extended according to the scheme of the present application without inventive efforts will be within the scope of protection of the present application for a person skilled in the art.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.