Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.
It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.
The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates anexemplary system architecture 100 to which embodiments of the disclosed method of image authentication may be applied.
As shown in fig. 1, thesystem architecture 100 may includeterminal devices 101, 102, 103, anetwork 104, and aserver 105. Thenetwork 104 serves as a medium for providing communication links between theterminal devices 101, 102, 103 and theserver 105.Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use theterminal devices 101, 102, 103 to interact with theserver 105 via thenetwork 104 to receive or send messages or the like. Theterminal devices 101, 102, 103 may have installed thereon various communication client applications, such as an image recognition application, a data analysis application, a natural language processing application, and the like.
Theterminal apparatuses 101, 102, and 103 may be hardware or software. When theterminal devices 101, 102, 103 are hardware, they may be various terminal devices having a display screen, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When theterminal apparatuses 101, 102, 103 are software, they can be installed in the above-listed terminal apparatuses. Which may be implemented as multiple software or software modules (e.g., to provide image input, text input, etc.), or as a single software or software module. And is not particularly limited herein.
Theserver 105 may be a server that provides various services, such as a server that performs processing such as recognition of images input by theterminal apparatuses 101, 102, 103. The server may perform processing such as recognition on the received image and feed back a processing result (e.g., a result of the recognition) to the terminal device.
It should be noted that the method for image authentication provided by the embodiment of the present disclosure is generally performed by theserver 105, and accordingly, the device for displaying is generally disposed in theserver 105.
It should be noted that the local area of theserver 105 may also directly store the image, and theserver 105 may directly extract the local image to obtain the recognition result through the recognition method, in this case, theexemplary system architecture 100 may not include theterminal devices 101, 102, 103 and thenetwork 104.
It should be noted that theterminal devices 101, 102, and 103 may also be installed with an image recognition application, and in this case, the method of image authentication may also be executed by theterminal devices 101, 102, and 103. At this point, theexemplary system architecture 100 may also not include theserver 105 and thenetwork 104.
Theserver 105 may be hardware or software. When theserver 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (for example, a service for providing image authentication), or may be implemented as a single software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, aflow 200 of some embodiments of a method of image authentication in accordance with the present disclosure is shown. The image identification method comprises the following steps:
step 201, a first target image is acquired.
In some embodiments, the subject of the method of image authentication (e.g., the terminal device shown in fig. 1) may acquire a first target image. Among them, images are a kind of similarity, vivid description or portrayal of objective objects, and are the most commonly used information carriers in human social activities. The image is all pictures with visual effect. In particular, the first target image may be an image of an apple.
Step 202, inputting the first target image into a pre-trained detection model to generate a second target image.
In some embodiments, the performing agent inputs the first target image into a pre-trained detection model to generate a process image. The pre-trained detection model includes a convolutional layer, a region proposal layer, a matching layer, a full convolutional layer, and an output layer.
In some optional implementations of some embodiments, the executing subject inputs the first target image into a pre-trained detection model, and an output of the pre-trained detection model is used as the process image. The process of generating a process image from a pre-trained detection model may include the steps of:
first, a first target image is input into the convolutional layer to generate a first feature map.
And secondly, inputting the first feature map into the region suggestion layer to generate a candidate region map.
Specifically, the region suggestion layer performs sliding window in the first feature map through windows or anchor points with different multiples and length-width ratios to generate a candidate region map. Specifically, 3 windows may be defined. The size of the first window may be 16 pixels, including 3 windows respectively representing length-width ratios of 1:1, 1:2, and 2: 1. The size of the second window may be 8 pixels, including 3 windows respectively representing length-width ratios of 1:1, 1:2, and 2: 1. The size of the third window can be 32 pixel points, including 3 windows respectively representing the length-width ratios of 1:1, 1:2 and 2: 1. The region suggestion layer performs sliding window on the first feature map by using the 3 windows, and calculates the intersection ratio by using the following formula respectively:
wherein IoU represents the cross-over ratio, A represents the window generated by the region suggestion layer, B represents the correct window in the sample database used for pre-training, S represents the areaA∩BDenotes the area of overlap of A and B, SA∪BThe union area after merging a and B is shown. In response to the value of IoU being greater than "0.5", the region a is included in the candidate region to obtain a candidate region map.
And thirdly, inputting the first feature map and the candidate region map into a matching layer to generate a second feature map.
Specifically, the matching layer realizes pooling of the candidate region maps, so that the candidate region maps with different sizes are pooled to obtain the second feature map with a fixed size. Optionally, the pooling operation may use a bilinear interpolation algorithm to obtain the interpolated second feature map. And marking the part which is not subjected to interpolation processing in the second feature map as a candidate region.
And fourthly, inputting the second characteristic diagram into the full convolution layer to generate a third characteristic diagram.
And fifthly, inputting the third feature diagram into the output layer to generate a process image.
In some optional implementations of some embodiments, the performing the subject cutting procedure image generates a second target image. The outermost edges of the candidate areas of the mark form a bounding box in the process image. The outermost edge may be defined according to the abscissa minimum and maximum values and the ordinate minimum and maximum values of all pixels inside the candidate region of the mark. The outermost edges form a rectangle. The candidate regions are cut out of the process image using a photo-processing toolkit based on the rectangular box. The width of the rectangular frame is set to a uniform pixel size. Specifically, the width may be 300 pixels. And the height of the rectangular frame is correspondingly transformed according to the aspect ratio of the candidate region picture so as to generate a second target image.
Step 203, inputting the second target image into a pre-trained classification model, and generating an identification result set of the second target image.
In some embodiments, the performing subject inputs the second target image into a pre-trained classification model. Optionally, the pre-trained classification model comprises a first number of pre-trained neural networks. The first number of pre-trained neural networks corresponds to a first number of pre-determined image classes.
Optionally, the pre-trained neural network is a residual error network. The residual network is composed of a second number of residual modules. Wherein each residual module generates an output using the following equation:
y=F(x,{Wi})+x,
wherein x is the input of the residual module, y is the output of the residual module, F () is the residual function, W is the weight matrix, i is the layer count in the residual module, W is the layer count in the residual moduleiWeight matrix representing i-th layer, { WiDenotes the set of weight matrices for all layers in the residual block. Specifically, the residual function F () is expressed as the following equation:
F(x)=W2σ(W1x),
wherein x is the input of the residual module, W is a weight matrix, W1Weight matrix, W, representing layer 12The weight matrix representing layer 2, σ represents the activation function. In particular, the activation function may be a function that runs on a neuron of the artificial neural network, responsible for mapping the input of the neuron to the output. Specifically, the activation function may be a ReLu function, expressed as:
σ(x)=max(0,x)
where σ denotes an activation function, x denotes an arbitrary integer input, and max () is a process of finding the maximum value.
In some optional implementations of some embodiments, the executing subject inputs the second target image into a pre-trained classification model, and obtains an output result set. The pre-trained classification model includes a first number of pre-trained neural networks. The execution subject inputs the second target image into a first number of pre-trained neural networks to obtain a first number of output results. The output result set includes a first number of output results. And determining an output result set of the pre-trained classification model as an identification result set of the second target image. Wherein the set of output results of the pre-trained classification model is a set of output results of a first number of pre-trained neural networks. The discrimination result is an output result of a pre-trained neural network, and the discrimination result set includes a first number of discrimination results.
Step 204, based on the set of authentication results, determining the category of the first target image.
In some embodiments, the executing entity determines the category of the first target image based on the set of authentication results.
In response to all of the values in the set of authentication results being negative, determining that the category of the first target image is null. Specifically, the category of the first target image does not belong to a predetermined image category. And in response to the values in the identification result set not being all negative values, determining the category corresponding to the maximum value in the identification result set as the category of the first target image.
Optionally, the execution subject sends the category of the first target image to a device supporting display, and controls the device to display the category. The display-supporting device may be a device that is in communication connection with the execution subject, and may display an image category according to the received information. For example, the device displays category information of "apple of the first kind", wherein the first kind may indicate that the origin of the apple is "Shandong province". For another example, the device displays category information of "apple of the second category", wherein the second category may indicate that the origin of the apple is "Hebei province". The automatic display mode emphasizes the category condition of the first target image, thereby being beneficial to improving the accuracy and convenience of the judgment and decision of the user about the first target image.
One embodiment presented in fig. 2 has the following beneficial effects: based on the first target image, the second target image is automatically obtained by utilizing the pre-trained detection model without manually determining the key target area in the first target image. And classifying the second target image by using a pre-trained classification model to obtain an identification result set of the second target image. Based on the authentication set, a category of the first target image is automatically generated. The embodiment of the disclosure generates the second target feature by using the pre-trained detection model, and automatically acquires the effective image target area without manual intervention. And the identification result set is automatically generated by utilizing the pre-trained classification model, and the category of the first target image is automatically determined according to the identification result set, so that the automation degree and the convenience of the image identification process are improved.
With continued reference to FIG. 3, aflow 300 of one embodiment of the training step of pre-training a neural network is shown, in accordance with the present disclosure. The training step may include the steps of:
step 301, determining a network structure of the initial neural network and initializing network parameters of the initial neural network.
In this embodiment, the execution subject of the training step may be the same as or different from the execution subject of the method of image authentication (e.g., the terminal device shown in fig. 1). If the network structure information is the same as the network parameter information, the main body of the training step can store the trained network structure information and the parameter values of the network parameters after the neural network is obtained through training. If the difference is not the same, the executive body of the training step can send the trained network structure information and the parameter values of the network parameters to the executive body of the image identification method after the neural network is trained.
In this embodiment, the executing agent of the training step may first determine the network structure of the initial neural network. For example, it is necessary to determine which layers the initial neural network includes, the connection order relationship between layers, and which neurons each layer includes, the weight (weight) and bias term (bias) corresponding to each neuron, the activation function of each layer, and so on. Optionally, the neural network may comprise a second number of residual modules.
The executing agent of this training step may then initialize the network parameters of the initial neural network. In practice, the network parameters (e.g., weight parameters and bias parameters) of the initial neural network may be initialized with some different small random numbers. The small random number is used for ensuring that the network does not enter a saturation state due to overlarge weight value, so that training fails, and the different random numbers are used for ensuring that the network can normally learn.
Step 302, a training sample set is obtained.
In this embodiment, the executing entity of the training step may obtain the training sample set from other terminal devices connected to the executing entity through a network, locally or remotely. Wherein the training samples include sample images and sample classes corresponding to the sample images.
Step 303, selecting a sample from the sample set, using a sample image included in the sample as an input, and using a corresponding pre-obtained sample category corresponding to the sample image as an expected output, and training a neural network.
In this embodiment, the main body for performing the training step may perform the first step of training the neural network.
Step one, a neural network training process.
Firstly, sample images included in training samples in a selected training sample set are input to an initial neural network, and the category of the selected samples is obtained.
Second, the category of the selected sample is compared with the corresponding sample category. Specifically, the difference between the category of the selected sample and the corresponding sample category may be first calculated using a preset loss function. For example, the difference between the class of the selected sample and the corresponding sample class can be calculated by using a cross entropy loss function, and the problem of the reduction of the machine learning rate can be avoided when the gradient is reduced by using a sigmoid function in the cross entropy loss function.
Thirdly, in response to determining that the initial neural network reaches the optimization goal, the training is ended with the initial neural network as a pre-trained neural network after the training is completed. Specifically, the preset optimization objectives may include, but are not limited to, at least one of the following: the training time exceeds the preset time; the training times exceed the preset times; the calculated difference is less than a preset difference threshold.
Step 304, in response to determining that the initial neural network is not trained, adjusting relevant parameters in the initial neural network, and reselecting samples from the sample set, and performing the training step again using the adjusted initial neural network as the initial neural network.
In this embodiment, the main body of the training step adjusts the relevant parameters in the initial neural network in response to determining that the initial neural network is not trained, specifically, in response to the initial neural network not reaching the optimization goal. In particular, various implementations may be employed to adjust network parameters of the initial neural network based on differences between the categories of the selected samples and the corresponding sample categories. For example, Adam, BP (Back Propagation) algorithm or SGD (Stochastic Gradient Descent) algorithm may be used to adjust the network parameters of the initial neural network.
Optionally, the executing entity reselects the sample from the sample set. And taking the sample image included in the sample as input, taking the corresponding pre-obtained sample class corresponding to the sample image as expected output, using the adjusted initial neural network as the initial neural network, executing the first step, and training the neural network again.
In this embodiment, the executing subject of the training step determines the initial neural network obtained by training as a neural network trained in advance.
One embodiment presented in fig. 3 has the following beneficial effects: a neural network is trained based on the sample images and sample classes corresponding to the sample images. The neural network can be directly applied to determine the probability that the input image corresponds to the category. The first target image is directly input into the neural network without manual intervention or extraction of the characteristics of the image, and the probability of the first target image corresponding to the category can be automatically obtained.
Referring now to FIG. 4, a block diagram of acomputer system 400 suitable for use in implementing a server of an embodiment of the present disclosure is shown. The server shown in fig. 4 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.
As shown in fig. 4, thecomputer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In theRAM 403, various programs and data necessary for the operation of thesystem 400 are also stored. TheCPU 401, ROM402, andRAM 403 are connected to each other via abus 404. An Input/Output (I/O)interface 405 is also connected to thebus 404.
The following components are connected to the I/O interface 405: astorage section 406 including a hard disk and the like; and acommunication section 407 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. Thecommunication section 407 performs communication processing via a network such as the internet. A drive 408 is also connected to the I/O interface 405 as needed. Aremovable medium 409 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted as necessary on the drive 408, so that a computer program read out therefrom is mounted as necessary in thestorage section 406.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 407 and/or installed from theremovable medium 409. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.