wherein IoU represents the cross-over ratio, A represents the window generated by the region suggestion layer, B represents the correct window in the sample database used for pre-training, S represents the area_A∩BDenotes the area of overlap of A and B, S_A∪BThe union area after merging a and B is shown. In response to the value of IoU being greater than "0.5", the region a is included in the candidate region to obtain a candidate region map.

And thirdly, inputting the first feature map and the candidate region map into a matching layer to generate a second feature map.

Specifically, the matching layer realizes pooling of the candidate region maps, so that the candidate region maps with different sizes are pooled to obtain the second feature map with a fixed size. Optionally, the pooling operation may use a bilinear interpolation algorithm to obtain the interpolated second feature map. And marking the part which is not subjected to interpolation processing in the second feature map as a candidate region.

And fourthly, inputting the second characteristic diagram into the full convolution layer to generate a third characteristic diagram.

And fifthly, inputting the third feature diagram into the output layer to generate a process image.

In some optional implementations of some embodiments, the performing the subject cutting procedure image generates a second target image. The outermost edges of the candidate areas of the mark form a bounding box in the process image. The outermost edge may be defined according to the abscissa minimum and maximum values and the ordinate minimum and maximum values of all pixels inside the candidate region of the mark. The outermost edges form a rectangle. The candidate regions are cut out of the process image using a photo-processing toolkit based on the rectangular box. The width of the rectangular frame is set to a uniform pixel size. Specifically, the width may be 300 pixels. And the height of the rectangular frame is correspondingly transformed according to the aspect ratio of the candidate region picture so as to generate a second target image.

Step 203, inputting the second target image into a pre-trained classification model, and generating an identification result set of the second target image.

In some embodiments, the performing subject inputs the second target image into a pre-trained classification model. Optionally, the pre-trained classification model comprises a first number of pre-trained neural networks. The first number of pre-trained neural networks corresponds to a first number of pre-determined image classes.

Optionally, the pre-trained neural network is a residual error network. The residual network is composed of a second number of residual modules. Wherein each residual module generates an output using the following equation:

y＝F(x,{W_i})+x，

wherein x is the input of the residual module, y is the output of the residual module, F () is the residual function, W is the weight matrix, i is the layer count in the residual module, W is the layer count in the residual module_iWeight matrix representing i-th layer, { W_iDenotes the set of weight matrices for all layers in the residual block. Specifically, the residual function F () is expressed as the following equation:

F(x)＝W₂σ(W₁x)，

wherein x is the input of the residual module, W is a weight matrix, W₁Weight matrix, W, representing layer 1₂The weight matrix representing layer 2, σ represents the activation function. In particular, the activation function may be a function that runs on a neuron of the artificial neural network, responsible for mapping the input of the neuron to the output. Specifically, the activation function may be a ReLu function, expressed as:

σ(x)＝max(0,x)

where σ denotes an activation function, x denotes an arbitrary integer input, and max () is a process of finding the maximum value.

In some optional implementations of some embodiments, the executing subject inputs the second target image into a pre-trained classification model, and obtains an output result set. The pre-trained classification model includes a first number of pre-trained neural networks. The execution subject inputs the second target image into a first number of pre-trained neural networks to obtain a first number of output results. The output result set includes a first number of output results. And determining an output result set of the pre-trained classification model as an identification result set of the second target image. Wherein the set of output results of the pre-trained classification model is a set of output results of a first number of pre-trained neural networks. The discrimination result is an output result of a pre-trained neural network, and the discrimination result set includes a first number of discrimination results.

Step 204, based on the set of authentication results, determining the category of the first target image.

In some embodiments, the executing entity determines the category of the first target image based on the set of authentication results.

In response to all of the values in the set of authentication results being negative, determining that the category of the first target image is null. Specifically, the category of the first target image does not belong to a predetermined image category. And in response to the values in the identification result set not being all negative values, determining the category corresponding to the maximum value in the identification result set as the category of the first target image.

Optionally, the execution subject sends the category of the first target image to a device supporting display, and controls the device to display the category. The display-supporting device may be a device that is in communication connection with the execution subject, and may display an image category according to the received information. For example, the device displays category information of "apple of the first kind", wherein the first kind may indicate that the origin of the apple is "Shandong province". For another example, the device displays category information of "apple of the second category", wherein the second category may indicate that the origin of the apple is "Hebei province". The automatic display mode emphasizes the category condition of the first target image, thereby being beneficial to improving the accuracy and convenience of the judgment and decision of the user about the first target image.

One embodiment presented in fig. 2 has the following beneficial effects: based on the first target image, the second target image is automatically obtained by utilizing the pre-trained detection model without manually determining the key target area in the first target image. And classifying the second target image by using a pre-trained classification model to obtain an identification result set of the second target image. Based on the authentication set, a category of the first target image is automatically generated. The embodiment of the disclosure generates the second target feature by using the pre-trained detection model, and automatically acquires the effective image target area without manual intervention. And the identification result set is automatically generated by utilizing the pre-trained classification model, and the category of the first target image is automatically determined according to the identification result set, so that the automation degree and the convenience of the image identification process are improved.

With continued reference to FIG. 3, aflow 300 of one embodiment of the training step of pre-training a neural network is shown, in accordance with the present disclosure. The training step may include the steps of:

step 301, determining a network structure of the initial neural network and initializing network parameters of the initial neural network.

In this embodiment, the execution subject of the training step may be the same as or different from the execution subject of the method of image authentication (e.g., the terminal device shown in fig. 1). If the network structure information is the same as the network parameter information, the main body of the training step can store the trained network structure information and the parameter values of the network parameters after the neural network is obtained through training. If the difference is not the same, the executive body of the training step can send the trained network structure information and the parameter values of the network parameters to the executive body of the image identification method after the neural network is trained.

In this embodiment, the executing agent of the training step may first determine the network structure of the initial neural network. For example, it is necessary to determine which layers the initial neural network includes, the connection order relationship between layers, and which neurons each layer includes, the weight (weight) and bias term (bias) corresponding to each neuron, the activation function of each layer, and so on. Optionally, the neural network may comprise a second number of residual modules.

The executing agent of this training step may then initialize the network parameters of the initial neural network. In practice, the network parameters (e.g., weight parameters and bias parameters) of the initial neural network may be initialized with some different small random numbers. The small random number is used for ensuring that the network does not enter a saturation state due to overlarge weight value, so that training fails, and the different random numbers are used for ensuring that the network can normally learn.

Step 302, a training sample set is obtained.

In this embodiment, the executing entity of the training step may obtain the training sample set from other terminal devices connected to the executing entity through a network, locally or remotely. Wherein the training samples include sample images and sample classes corresponding to the sample images.

Step 303, selecting a sample from the sample set, using a sample image included in the sample as an input, and using a corresponding pre-obtained sample category corresponding to the sample image as an expected output, and training a neural network.

In this embodiment, the main body for performing the training step may perform the first step of training the neural network.

Step one, a neural network training process.

Firstly, sample images included in training samples in a selected training sample set are input to an initial neural network, and the category of the selected samples is obtained.

Second, the category of the selected sample is compared with the corresponding sample category. Specifically, the difference between the category of the selected sample and the corresponding sample category may be first calculated using a preset loss function. For example, the difference between the class of the selected sample and the corresponding sample class can be calculated by using a cross entropy loss function, and the problem of the reduction of the machine learning rate can be avoided when the gradient is reduced by using a sigmoid function in the cross entropy loss function.

Thirdly, in response to determining that the initial neural network reaches the optimization goal, the training is ended with the initial neural network as a pre-trained neural network after the training is completed. Specifically, the preset optimization objectives may include, but are not limited to, at least one of the following: the training time exceeds the preset time; the training times exceed the preset times; the calculated difference is less than a preset difference threshold.

Step 304, in response to determining that the initial neural network is not trained, adjusting relevant parameters in the initial neural network, and reselecting samples from the sample set, and performing the training step again using the adjusted initial neural network as the initial neural network.

In this embodiment, the main body of the training step adjusts the relevant parameters in the initial neural network in response to determining that the initial neural network is not trained, specifically, in response to the initial neural network not reaching the optimization goal. In particular, various implementations may be employed to adjust network parameters of the initial neural network based on differences between the categories of the selected samples and the corresponding sample categories. For example, Adam, BP (Back Propagation) algorithm or SGD (Stochastic Gradient Descent) algorithm may be used to adjust the network parameters of the initial neural network.

Optionally, the executing entity reselects the sample from the sample set. And taking the sample image included in the sample as input, taking the corresponding pre-obtained sample class corresponding to the sample image as expected output, using the adjusted initial neural network as the initial neural network, executing the first step, and training the neural network again.

In this embodiment, the executing subject of the training step determines the initial neural network obtained by training as a neural network trained in advance.

One embodiment presented in fig. 3 has the following beneficial effects: a neural network is trained based on the sample images and sample classes corresponding to the sample images. The neural network can be directly applied to determine the probability that the input image corresponds to the category. The first target image is directly input into the neural network without manual intervention or extraction of the characteristics of the image, and the probability of the first target image corresponding to the category can be automatically obtained.

Referring now to FIG. 4, a block diagram of acomputer system 400 suitable for use in implementing a server of an embodiment of the present disclosure is shown. The server shown in fig. 4 is only an example, and should not bring any limitation to the function and the scope of use of the embodiments of the present disclosure.

As shown in fig. 4, thecomputer system 400 includes a Central Processing Unit (CPU)401 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage section 408 into a Random Access Memory (RAM) 403. In theRAM 403, various programs and data necessary for the operation of thesystem 400 are also stored. TheCPU 401, ROM402, andRAM 403 are connected to each other via abus 404. An Input/Output (I/O)interface 405 is also connected to thebus 404.

The following components are connected to the I/O interface 405: astorage section 406 including a hard disk and the like; and acommunication section 407 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. Thecommunication section 407 performs communication processing via a network such as the internet. A drive 408 is also connected to the I/O interface 405 as needed. Aremovable medium 409 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted as necessary on the drive 408, so that a computer program read out therefrom is mounted as necessary in thestorage section 406.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through thecommunication section 407 and/or installed from theremovable medium 409. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 401. It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the C language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of image authentication, comprising:

acquiring a first target image;

inputting the first target image into a pre-trained detection model to generate a second target image;

inputting the second target image into a pre-trained classification model to generate an identification result set of the second target image;

determining a category of the first target image based on the set of authentication results.

2. The method of claim 1, wherein the method further comprises:

and sending the category of the first target image to a device supporting display, and controlling the device to display the category.

3. The method of claim 1, wherein the inputting the first target image into a pre-trained detection model, generating a second target image, comprises:

inputting the first target image into a pre-trained detection model to generate a process image;

and cutting the process image to generate the second target image.

4. The method of claim 3, wherein the pre-trained detection model comprises a convolutional layer, a region proposal layer, a matching layer, a full convolutional layer, and an output layer; and

inputting the first target image into a pre-trained detection model to generate a process image, wherein the process image comprises:

inputting the first target image into the convolutional layer to generate a first feature map;

inputting the first feature map into a region suggestion layer to generate a candidate region map;

inputting the first feature map and the candidate region map into the matching layer to generate a second feature map;

inputting the second feature map into the full convolution layer to generate a third feature map;

and inputting the third feature map into the output layer to generate the process image.

5. The method of claim 4, wherein the pre-trained classification model comprises a first number of pre-trained neural networks, wherein the pre-trained neural networks are residual networks comprised of a second number of residual modules that generate an output using the equation: y ═ F (x, { W)_iX is the input of the residual block, y is the output of the residual block, F () is the residual function, W is the weight matrix, i is the layer count in the residual block, W_iWeight matrix representing i-th layer, { W_iDenotes the set of weight matrices for all layers in the residual block.

6. The method of claim 5, wherein the first number of pre-trained neural networks corresponds to a first number of predetermined classes.

7. The method of claim 6, wherein the pre-trained neural network is obtained by:

determining a network structure of an initial neural network and initializing network parameters of the initial neural network;

acquiring a training sample set, wherein training samples comprise sample images and sample classes corresponding to the sample images;

selecting samples from the sample set, and performing the following training steps:

inputting a sample image of a selected sample into an initial neural network to obtain the category of the selected sample;

comparing the selected sample category with the corresponding sample category;

determining whether the initial neural network reaches a preset optimization target according to the comparison result;

in response to determining that the initial neural network meets the optimization goal, treating the initial neural network as the pre-trained neural network for which training is complete;

in response to determining that the initial neural network is not trained, adjusting relevant parameters in the initial neural network, and reselecting samples from the sample set, the training step is performed again using the adjusted initial neural network as the initial neural network.

8. The method of claim 7, wherein the inputting the second target image into a pre-trained classification model to generate the set of discrimination results for the second target image comprises:

inputting the second target image into a pre-trained classification model to obtain an output result set;

determining an output result set of the pre-trained classification model as an identification result set of the second target image, wherein the output result set of the pre-trained classification model is a set of output results of the first number of pre-trained neural networks, the identification result is an output result of the pre-trained neural networks, and the identification result set includes the first number of identification results.

9. The method of claim 8, wherein said determining a category of the first target image based on the set of authentication results comprises:

determining that the category of the first target image is null in response to all values in the set of authentication results being negative values;

in response to the values in the set of identification results not being all negative values, determining the category corresponding to the maximum value in the set of identification results as the category of the first target image.

10. A first terminal device comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.