Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is noted that embodiments of the invention and features of the embodiments may be combined with each other without conflict.
In order to solve the problems in the prior art, a first embodiment of the present invention provides a method for determining a picture type, as shown in fig. 2, the method mainly includes:
Step S201, a target picture is acquired, and a plurality of first pictures are obtained based on processing pixels in the target picture.
Specifically, according to the embodiment of the invention, pixels in a target picture (the target picture refers to a picture of the content of an inspection picture and/or the type of the picture to be determined) are processed, so that the position of a pixel point is shifted and/or the pixel value corresponding to the pixel point is changed, and a plurality of first pictures are obtained. In the process of processing the pixels in the target picture, the pixel points corresponding to the attack noise are shifted in position and/or the pixel values corresponding to the pixel points are changed, so that the attack noise superposed on the target picture is invalid. The attack noise is obtained by training specific pictures through a gradient lifting method of the neural network, so that even if a neuron displacement (attack noise sending position deviation) occurs to the input layer vector (the pictures are pulled into one dimension) of the neural network, the specific attack vector (the feature vector corresponding to the attack noise) (the pictures are pulled into one dimension) can be invalid, and the accuracy of type identification of the first pictures by using the classification model is improved.
Further, according to an embodiment of the present invention, the step of obtaining a plurality of first pictures based on processing pixels in a target picture includes:
performing data enhancement processing on the target picture to enable the position of the pixel point in the target picture to be shifted, and further obtaining a first picture, wherein the data enhancement processing comprises at least one of the following processing modes: rotation processing, reduction processing, enlargement processing, and translation processing.
It should be noted that, according to the embodiment of the present invention, the above-mentioned data enhancement processing is performed on the target image, so that the pixel point in the target image may generate displacement of several pixels, (for example, displacement of one pixel point or displacement of two pixel points is generated), and the attack noise superimposed on the target image may be shifted with a small offset amplitude, so that the attack noise is invalid. If the offset amplitude is too large, the recognition accuracy of the classification model is reduced.
Through the arrangement, the data enhancement processing is carried out on the target picture so that the position of the pixel point in the target picture is shifted, and then the processing of the pixels in the target picture is completed, so that a plurality of first pictures are obtained. Meanwhile, a plurality of first pictures obtained by carrying out data enhancement processing on the target picture can be subjected to classification processing by adopting the same classification model, so that a plurality of classification results are obtained, and the type of the target picture is determined according to the plurality of classification results. The situation that a plurality of classification models are trained respectively to generate higher cost in the prior art is avoided.
Preferably, according to an embodiment of the present invention, when the data enhancement processing is at least one of rotation processing, reduction processing, and translation processing, the step of obtaining the first picture based on the data enhancement processing on the target picture includes:
Processing the target picture according to the offset amplitude and the offset direction indicated by the data enhancement processing mode to obtain a second picture, wherein the offset amplitude is determined according to the size of the target picture;
determining a non-overlapping area corresponding to the target picture and the second picture;
Adjusting the pixel value corresponding to each pixel point in the non-overlapping region to be the pixel mean value of the target picture, and obtaining an adjusted non-overlapping region;
And combining the adjusted non-overlapping area and the overlapping area of the second picture and the target picture to obtain a first picture.
Through the arrangement, the pixel point is shifted by adopting the enhancement processing, so that the attack noise obtained through the targeted training is invalid, and the defending capability and stability of the classification model on the attack noise are improved.
Preferably, according to an embodiment of the present invention, when the data enhancement processing is the enlarging processing, the step of obtaining the first picture based on the data enhancement processing on the target picture includes:
amplifying the target picture according to the amplification proportion to obtain a second picture;
and determining the overlapping area of the second picture and the target picture as the first picture.
By the arrangement, the influence of attack noise on the target picture is obviously reduced in the obtained first picture, and the accuracy of a classification result obtained by using the classification model to the picture type determination is improved.
Alternatively, according to an embodiment of the present invention, the step of obtaining a plurality of first pictures based on processing pixels in a target picture further includes:
filtering processing is carried out on pixels in the target picture so that pixel values corresponding to pixel points in the target picture are changed, and then a first picture is obtained; wherein the filtering process includes at least one of the following processing modes: block filter processing, mean filter processing, gaussian filter processing, median filter processing and bilateral filter processing.
It should be noted that, according to the embodiment of the present invention, the above-mentioned data enhancement processing is performed on the target picture, so that the pixel point in the target picture may generate a difference value of several pixels, (for example, a difference value of one pixel or a difference value of two pixels is generated), and the attack noise superimposed on the target picture may be shifted by a smaller difference value of pixels, so that the attack noise is disabled. If the pixel difference value is too large, the recognition accuracy of the classification model is reduced.
The filtering process is also called smoothing process, and noise cancellation is achieved by changing the pixel value size. The average filtering is a typical linear filtering, which means that a template is given to the target pixel on the image, the template comprises surrounding adjacent pixels (8 pixels around the target pixel as the center to form a filtering template, namely the target pixel is removed), and the average value of all pixels in the template is used for replacing the original pixel value; the block filtering and the average filtering core are basically consistent, and the difference is that homogenization treatment is not needed; median filtering is a nonlinear smoothing technique that sets the gray value of each pixel to the median of the gray values of all pixels within a certain neighborhood window of that point, i.e., the value of the center pixel is replaced with the median (not the average) of all pixel values. The median filtering avoids the influence of isolated noise points of the image by selecting a median value, has good filtering effect on impulse noise, and particularly can protect the edge of a signal from being blurred while filtering noise; the Gaussian filtering is linear smoothing filtering, is suitable for eliminating Gaussian noise, and is particularly suitable for carrying out a weighted average process on the whole image, and the value of each pixel point is obtained by carrying out weighted average on the value of each pixel point and other pixel values in a neighborhood; bilateral filtering (Bilateral filter) is a nonlinear filtering method, which constructs a weighted average value according to each pixel and the field thereof, and the weighted calculation comprises two parts, wherein the weighted mode of the first part is the same as that in Gaussian smoothing, and the second part also belongs to Gaussian weighting, but is not based on the weighting above the spatial distance between a central pixel point and other pixel points, but is based on the weighting of brightness difference values between other pixels and the central pixel point.
According to the embodiment of the invention, the target picture can be processed at least once in the data enhancement processing and/or the filtering processing, if the target picture is processed for multiple times in the same processing mode, the change mode of the pixel point in each processing is different, for example, if the target picture is processed for multiple times in the rotation processing, the rotation angle or direction of each time is inconsistent.
Step S202, feature vectors in a plurality of first pictures are respectively extracted based on the neural network model, the feature vectors are input into the classification model for classification processing, a plurality of classification results are obtained, and the plurality of classification results correspond to the plurality of first pictures.
Specifically, the plurality of first pictures are respectively input into a neural network model, feature vectors corresponding to the first pictures are output from an output layer of the neural network model, and the first pictures are obtained by processing pixels of a target picture, so that the feature vectors corresponding to the first pictures are different from the feature vectors corresponding to the target picture, interference on attack noise is realized, and classification results corresponding to the first pictures can be obtained by respectively classifying the feature vectors corresponding to the first pictures by using a classification model. By the arrangement, the simultaneous adoption of a plurality of classification models is avoided, and the cost generated by training the classification models with different structures is reduced. The classification model may be a classification model of the following structure: logistic regression (Logistic Regression, a classical classification model that can handle binary classification as well as multivariate classification), SVM (Support Vector Machine, support vector machine, which shows many unique advantages in solving small sample, nonlinear and high dimensional pattern recognition and can be generalized to other machine learning problems such as function fitting), xgboost (eXtreme Gradient Boosting) extreme gradient lifting, which is a lifting tree model, so it integrates many tree models together to form a very strong classifier), etc. The classification model is a model which is already trained, and the model is trained by adopting the existing training method.
Step S203, determining the type of the target picture according to a plurality of classification results.
The type of the picture can be set manually, for example, in the process of examining the picture, the illegal type picture and the yellow type picture can be set according to related requirements, and the illegal type picture, the illegal type picture and the yellow type picture can be set as the picture of forbidden transmission type.
Specifically, according to an embodiment of the present invention, the step of determining the type of the target picture according to the plurality of classification results includes:
and determining the type of the target picture according to the plurality of classification results and the voting rule.
The voting rule is a majority voting rule, which means that only when a certain type of evaluation result obtains the number of votes greater than a certain threshold value, the result is output, otherwise, the evaluation result is not output. In the case of not outputting the evaluation result, a manner of performing evaluation by manual intervention may be adopted. This threshold number is typically an absolute majority, i.e. greater than half the total number, and is also set according to the actual situation. By the arrangement, the accuracy of the determined picture type is improved by introducing the majority voting rule in the process of determining the target picture type.
Alternatively, according to an embodiment of the present invention, the step of determining the type of the target picture according to the plurality of classification results further includes:
And carrying out weighting processing on the plurality of classification results, and determining the type of the target picture according to the weighting processing results and the result quantity threshold value.
The same classification model is adopted to classify the first picture to obtain classification results, so that the weighting is only required to set each weight to 1/N, and N is the number of the classification results.
According to the technical scheme of the embodiment of the invention, the target picture is acquired, and a plurality of first pictures are obtained based on processing pixels in the target picture; extracting feature vectors in the first pictures based on the neural network model respectively, inputting the feature vectors into the classification model for classification processing to obtain a plurality of classification results, wherein the plurality of classification results correspond to the first pictures; according to the technical means for determining the type of the target picture according to the multiple classification results, the technical effects that the cost for determining the type of the picture is high because the classification models with different structures are required to be trained in the prior art, the setting mode of the weight coefficient of each classification model is difficult to unify, the defense capacity and stability of the classification model to attack noise are poor, the determined accuracy of the type of the picture is low are solved, a plurality of first pictures are obtained by processing pixels in the picture to be classified, the picture type of the first pictures is identified by the same classification model, the cost for determining the type of the picture is reduced, the defense capacity and stability of the classification model to attack noise are improved, and the accuracy of the determined type of the picture is improved are achieved.
Fig. 3a is a schematic diagram of a main flow of a method for determining a picture type according to a second embodiment of the present invention; as shown in fig. 3a, the method for determining a picture type provided by the embodiment of the present invention mainly includes:
step S301, obtaining a target picture.
The target picture refers to a picture of which the content and/or the type of picture to be determined is examined, typically superimposed with an attack noise that interferes with the classification model. The target picture may be any picture, for example, a picture containing flowers, and for example, a picture containing jeans.
In step S302, when the data enhancement processing is at least one of rotation processing, reduction processing, and translation processing, the target picture is processed according to the offset magnitude and the offset direction indicated by the data enhancement processing method, and a second picture is obtained.
It should be noted that, according to the embodiment of the present invention, the above-mentioned data enhancement processing is performed on the target image, so that the pixel point in the target image may generate displacement of several pixels, (for example, displacement of one pixel point or displacement of two pixel points is generated), and the attack noise superimposed on the target image may be shifted with a small offset amplitude, so that the attack noise is invalid. If the offset amplitude is too large, the recognition accuracy of the classification model is reduced.
With the above arrangement, the data enhancement processing is performed on the target picture so that the positions of the pixels in the target picture are shifted, as shown in fig. 3b, 3c, and 3d, indicating schematic diagrams when the rotation processing, the reduction processing, and the translation processing are performed on the target picture, but the present invention is not limited thereto. If the target image is processed multiple times (two times or more) in the same processing method, the offset width and the offset direction (such as the rotation angle, the rotation direction, the reduction scale (the reduction width), the enlargement direction, the enlargement scale (the enlargement width), etc.) of each time need to be adjusted so that the changing method of the pixel point is different for each processing.
According to an embodiment of the present invention, the offset magnitude is determined according to the size of the target picture.
In this embodiment, in the implementation, if the number of pixels of the target picture is greater than the preset number, determining at least two pixels as the offset; if the number of the pixel points of the target picture is smaller than or equal to the preset number, determining one pixel point as the offset. Thereby flexibly processing each target picture. In addition, the offset (offset amplitude) should not be too large so as not to affect the accuracy of picture recognition. Furthermore, the preset number can be set according to the requirement, and in general, the offset amplitude is one pixel or two pixels, so that the effect of the embodiment of the invention can be achieved.
Step S303, determining a non-overlapping area corresponding to the target picture and the second picture.
As shown in fig. 3d, taking parallel panning processing on a target picture as an example, a box shown by a dotted line is the target picture, when the target picture performs panning processing to the right side, a region shown by a solid line is obtained as the second picture, and at this time, a region formed by a left dotted line and a left solid line is a non-overlapping region corresponding to the target picture and the second picture.
Step S304, the pixel value corresponding to each pixel point in the non-overlapping area is adjusted to be the pixel mean value of the target picture, and the adjusted non-overlapping area is obtained.
Specifically, obtaining pixel values corresponding to all pixel points in a target picture, and adding the pixel values of all the pixel points in the target picture to obtain a total pixel value; dividing the total pixel value by the number of the pixel points of the target picture to obtain a pixel mean value of the target picture, and taking the pixel mean value as a pixel value corresponding to each pixel point in the non-overlapping region.
Step S305, combining the adjusted non-overlapping area and the overlapping area of the second picture and the target picture to obtain a first picture.
The first picture obtained by processing the target picture through at least one of the rotation process, the reduction process, and the panning process, which is obtained through the above steps S302 to S305. Because the pixel position in the first picture slightly changes, the neural network model can still normally extract the characteristics of the normal picture, but for the picture superimposed with the attack noise, the attack noise can be invalid, so that the recognition accuracy of the subsequent classification model is improved.
In step S306, in the case that the data enhancement processing is the amplification processing, the target picture is subjected to the amplification processing according to the amplification ratio, so as to obtain a second picture.
Step S307 determines an overlapping area of the second picture and the target picture as the first picture.
The first picture obtained by processing the target picture through the enlargement processing obtained in the above steps S306 to S307. Attack noise is also disabled by adjusting the pixel position.
Step S308, performing filtering processing on pixels in the target picture to change pixel values corresponding to pixel points in the target picture, thereby obtaining a first picture.
Through the arrangement, the pixel value is changed by adopting filtering processing, namely smoothing processing, so that the activation state of each layer of the neural network model is changed, and the characteristic vector of the first picture is different from the characteristic vector of the target picture, thereby achieving the technical effects of eliminating noise, invalidating the noise picture and improving the defensive power and stability of the classification model to attack noise.
According to the embodiment of the invention, the target picture can be processed at least once in the data enhancement processing and/or the filtering processing, if the target picture is processed for multiple times in the same processing mode, the change mode of the pixel point in each processing is different, for example, if the target picture is processed for multiple times in the rotation processing, the rotation angle or direction of each time is inconsistent.
Step S309, extracting feature vectors in the first pictures based on the neural network model, and inputting the feature vectors into the classification model for classification processing to obtain a plurality of classification results, wherein the plurality of classification results correspond to the first pictures.
Specifically, the plurality of first pictures are respectively input into a neural network model, feature vectors corresponding to the first pictures are output from an output layer of the neural network model, and the first pictures are obtained by processing pixels of a target picture, so that the feature vectors corresponding to the first pictures are different from the feature vectors corresponding to the target picture, interference on attack noise is realized, and classification results corresponding to the first pictures can be obtained by respectively classifying the feature vectors corresponding to the first pictures by using a classification model. By the arrangement, the simultaneous adoption of a plurality of classification models is avoided, and the cost generated by training the classification models with different structures is reduced. The classification model may be a classification model of the following results: logistic regression (Logistic Regression, a classical classification model that can handle binary and multivariate classification.) SVM (Support Vector Machine, support vector machine, which shows many unique advantages in solving small sample, nonlinear and high dimensional pattern recognition and can be generalized to other machine learning problems such as function fitting), xgboost (eXtreme Gradient Boosting) extreme gradient lifting, is a lifting tree model, so it integrates many tree models together to form a very strong classifier).
And step S310, determining the type of the target picture according to a plurality of classification results and voting rules.
The voting rule refers to a majority voting rule, and means that only when a certain type of evaluation result obtains the number of votes greater than a certain threshold value, the result is output, otherwise, the evaluation result is not output. In the case of not outputting the evaluation result, a manner of performing evaluation by manual intervention may be adopted. This threshold number is typically an absolute majority, i.e. greater than half the total number, and is also set according to the actual situation. By the arrangement, the accuracy of the determined picture type is improved by introducing the majority voting rule in the process of determining the target picture type.
The type of the picture can be set manually, for example, in the process of examining the picture, the illegal type picture and the yellow type picture can be set according to related requirements, and the illegal type picture, the illegal type picture and the yellow type picture can be set as the picture of forbidden transmission type.
According to the embodiment of the invention, the classification result can be the probability that the target picture type is the propagation-forbidden picture, and if the probability is larger than the first probability, the first picture corresponding to the probability is the propagation-forbidden picture for each first picture; counting the number of the pictures which are forbidden to be transmitted and correspond to all the first pictures; if the number of the first pictures is greater than or equal to the first number (majority voting principle), determining that the type of the target picture is the pictures prohibited from being transmitted; if the number of the first pictures is smaller than or equal to the second number, determining that the type of the target picture is not the pictures prohibited from being transmitted; the first number is greater than the second number; if the first picture is the number of the prohibited pictures greater than the second number and less than the first number, the target picture type is determined as an unrecognizable type.
In the implementation, the first number and the second number may be set manually, the smaller the absolute value of the difference between the first number and the second number is, the larger the manual intervention amount is, the lower the accuracy of the type of the determined target picture is, the larger the absolute value of the difference between the first number and the second number is, the smaller the manual intervention amount is, and the higher the accuracy of the type of the determined target picture is. For example, the first number is 15 and the second number is 5. In addition, the first number is less than the number of first pictures and the second number is greater than zero. The first probability may be set, for example, to 0.6. The higher the accuracy of the classification model (i.e., the better the performance), the lower the first probability, the smaller the absolute value (and thus the reduced manual effort); the lower the accuracy of the classification model, the higher the first probability, and the larger the absolute value. The first information and the second information may be set, the first information is different from the second information, the first information may be 1, and the second information may be 0. Specifically, for a target picture of which the classification model cannot identify the type, the type corresponding to the target picture can be judged manually; if the target picture is determined to be the picture prohibited from being transmitted manually, inputting 1 manually, and determining that the type of the target picture is the picture prohibited from being transmitted when the 1 is received; if it is determined that the target picture is not a propagation-prohibited picture manually, inputting 0 manually, and determining that the type of the target picture is not a propagation-prohibited picture when receiving 0.
In this embodiment, through the relationship between the range formed by the first number and the second number and the number of the prohibited transmission pictures of the first picture, whether the type of the target picture is the prohibited transmission picture is judged, the reliable type of the target picture is directly output, and the accuracy of the determined type of the target picture is improved.
Step S311, weighting the multiple classification results, and determining the type of the target picture according to the weighted classification results and the threshold of the number of results.
The same classification model is adopted to classify the first picture to obtain classification results, so that the weighting is only required to set each weight to 1/N, and N is the number of the classification results.
According to the technical scheme of the embodiment of the invention, the target picture is acquired, and a plurality of first pictures are obtained based on processing pixels in the target picture; extracting feature vectors in the first pictures based on the neural network model respectively, inputting the feature vectors into the classification model for classification processing to obtain a plurality of classification results, wherein the plurality of classification results correspond to the first pictures; according to the technical means for determining the type of the target picture according to the multiple classification results, the technical effects that the cost for determining the type of the picture is high because the classification models with different structures are required to be trained in the prior art, the setting mode of the weight coefficient of each classification model is difficult to unify, the defense capacity and stability of the classification model to attack noise are poor, the determined accuracy of the type of the picture is low are solved, a plurality of first pictures are obtained by processing pixels in the picture to be classified, the picture type of the first pictures is identified by the same classification model, the cost for determining the type of the picture is reduced, the defense capacity and stability of the classification model to attack noise are improved, and the accuracy of the determined type of the picture is improved are achieved.
Fig. 4 is a schematic diagram of main modules of an apparatus for determining a picture type according to an embodiment of the present invention; as shown in fig. 4, an apparatus 400 for determining a picture type according to an embodiment of the present invention mainly includes:
the target picture obtaining module 401 is configured to obtain a target picture, and obtain a plurality of first pictures based on processing pixels in the target picture.
Specifically, according to the embodiment of the invention, pixels in a target picture (the target picture refers to a picture of the content of an inspection picture and/or the type of the picture to be determined) are processed, so that the position of a pixel point is shifted and/or the pixel value corresponding to the pixel point is changed, and a plurality of first pictures are obtained. In the process of processing the pixels in the target picture, the pixel points corresponding to the attack noise are shifted in position and/or the pixel values corresponding to the pixel points are changed, so that the attack noise superposed on the target picture is invalid, and the accuracy of type identification of the plurality of first pictures by using the classification model is improved.
Further, according to an embodiment of the present invention, the target picture obtaining module 401 is further configured to:
performing data enhancement processing on the target picture to enable the position of the pixel point in the target picture to be shifted, and further obtaining a first picture, wherein the data enhancement processing comprises at least one of the following processing modes: rotation processing, reduction processing, enlargement processing, and translation processing.
Through the arrangement, the data enhancement processing is carried out on the target picture so that the position of the pixel point in the target picture is shifted, and then the processing of the pixels in the target picture is completed, so that a plurality of first pictures are obtained. Meanwhile, a plurality of first pictures obtained by carrying out data enhancement processing on the target picture can be subjected to classification processing by adopting the same classification model, so that a plurality of classification results are obtained, and the type of the target picture is determined according to the plurality of classification results. The situation that a plurality of classification models are trained respectively to generate higher cost in the prior art is avoided.
Preferably, according to an embodiment of the present invention, in a case where the data enhancement process is at least one of a rotation process, a reduction process, and a translation process, the above-described target picture acquisition module 401 is further configured to:
Processing the target picture according to the offset amplitude and the offset direction indicated by the data enhancement processing mode to obtain a second picture, wherein the offset amplitude is determined according to the size of the target picture;
determining a non-overlapping area corresponding to the target picture and the second picture;
Adjusting the pixel value corresponding to each pixel point in the non-overlapping region to be the pixel mean value of the target picture, and obtaining an adjusted non-overlapping region;
And combining the adjusted non-overlapping area and the overlapping area of the second picture and the target picture to obtain a first picture.
Through the arrangement, the pixel point is shifted by adopting the enhancement processing, so that the attack noise obtained through the targeted training is invalid, and the defending capability and stability of the classification model on the attack noise are improved.
Preferably, according to an embodiment of the present invention, in a case where the data enhancement process is an enlargement process, the above-mentioned target picture acquisition module 401 is further configured to:
amplifying the target picture according to the amplification proportion to obtain a second picture;
and determining the overlapping area of the second picture and the target picture as the first picture.
By the arrangement, the influence of attack noise on the target picture is obviously reduced in the obtained first picture, and the accuracy of a classification result obtained by using the classification model to the picture type determination is improved.
Alternatively, according to an embodiment of the present invention, the target picture obtaining module 401 is further configured to:
filtering processing is carried out on pixels in the target picture so that pixel values corresponding to pixel points in the target picture are changed, and then a first picture is obtained; wherein the filtering process includes at least one of the following processing modes: block filter processing, mean filter processing, gaussian filter processing, median filter processing and bilateral filter processing.
The filtering process is also called smoothing process, and noise cancellation is achieved by changing the pixel value size.
The classification processing module 402 is configured to extract feature vectors in the plurality of first pictures based on the neural network model, and input the feature vectors into the classification model for classification processing, so as to obtain a plurality of classification results, where the plurality of classification results correspond to the plurality of first pictures.
Specifically, the plurality of first pictures are respectively input into a neural network model, feature vectors corresponding to the first pictures are output from an output layer of the neural network model, and the first pictures are obtained by processing pixels of a target picture, so that the feature vectors corresponding to the first pictures are different from the feature vectors corresponding to the target picture, interference on attack noise is realized, and classification results corresponding to the first pictures can be obtained by respectively classifying the feature vectors corresponding to the first pictures by using a classification model. By the arrangement, the simultaneous adoption of a plurality of classification models is avoided, and the cost generated by training the classification models with different structures is reduced.
A type determining module 403, configured to determine a type of the target picture according to the multiple classification results.
Specifically, according to an embodiment of the present invention, the type determining module 403 is further configured to:
and determining the type of the target picture according to the plurality of classification results and the voting rule.
The majority voting rule refers to that only when a certain type of evaluation result obtains the number of votes larger than a certain threshold value, the result is output, otherwise, the evaluation result is not output. In the case of not outputting the evaluation result, a manner of performing evaluation by manual intervention may be adopted. This threshold number is typically an absolute majority, i.e. greater than half the total number, and is also set according to the actual situation. By the arrangement, the accuracy of the determined picture type is improved by introducing the majority voting rule in the process of determining the target picture type.
Alternatively, according to an embodiment of the present invention, the above-mentioned type determining module 403 is further configured to:
And carrying out weighting processing on the plurality of classification results, and determining the type of the target picture according to the weighting processing results and the result quantity threshold value.
The same classification model is adopted to classify the first picture to obtain classification results, so that the weighting is only required to set each weight to 1/N, and N is the number of the classification results.
According to the technical scheme of the embodiment of the invention, the target picture is acquired, and a plurality of first pictures are obtained based on processing pixels in the target picture; extracting feature vectors in the first pictures based on the neural network model respectively, inputting the feature vectors into the classification model for classification processing to obtain a plurality of classification results, wherein the plurality of classification results correspond to the first pictures; according to the technical means for determining the type of the target picture according to the multiple classification results, the technical effects that the cost for determining the type of the picture is high because the classification models with different structures are required to be trained in the prior art, the setting mode of the weight coefficient of each classification model is difficult to unify, the defense capacity and stability of the classification model to attack noise are poor, the determined accuracy of the type of the picture is low are solved, a plurality of first pictures are obtained by processing pixels in the picture to be classified, the picture type of the first pictures is identified by the same classification model, the cost for determining the type of the picture is reduced, the defense capacity and stability of the classification model to attack noise are improved, and the accuracy of the determined type of the picture is improved are achieved.
Fig. 5 illustrates an exemplary system architecture 500 to which the method of recognizing a picture or the apparatus of recognizing a picture of the embodiment of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 505 via the network 504 using the terminal devices 501, 502, 503 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 501, 502, 503, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 501, 502, 503. The background management server may perform analysis and other processing on the received data such as the target picture, and feedback the classification result (e.g., the first picture, the classification result, and the type of the target picture—just an example) to the terminal device.
It should be noted that, the method for identifying a picture provided in the embodiment of the present invention is executed by the server 505 or the terminal, and accordingly, the device for identifying a picture is disposed in the server 505 or the terminal.
It should be understood that the number of terminal devices, networks and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 6 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 601.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a target picture acquisition module, a classification processing module, and a type determination module. The names of these modules do not limit the module itself in some cases, and for example, the target picture obtaining module may also be described as "a module for obtaining a target picture, which obtains a plurality of first pictures based on processing pixels in the target picture".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: acquiring a target picture, and processing pixels in the target picture to obtain a plurality of first pictures; extracting feature vectors in the first pictures based on the neural network model respectively, inputting the feature vectors into the classification model for classification processing to obtain a plurality of classification results, wherein the plurality of classification results correspond to the first pictures; and determining the type of the target picture according to the multiple classification results.
According to the technical scheme of the embodiment of the invention, the target picture is acquired, and a plurality of first pictures are obtained based on processing pixels in the target picture; extracting feature vectors in the first pictures based on the neural network model respectively, inputting the feature vectors into the classification model for classification processing to obtain a plurality of classification results, wherein the plurality of classification results correspond to the first pictures; according to the technical means for determining the type of the target picture according to the multiple classification results, the technical effects that the cost for determining the type of the picture is high because the classification models with different structures are required to be trained in the prior art, the setting mode of the weight coefficient of each classification model is difficult to unify, the defense capacity and stability of the classification model to attack noise are poor, the determined accuracy of the type of the picture is low are solved, a plurality of first pictures are obtained by processing pixels in the picture to be classified, the picture type of the first pictures is identified by the same classification model, the cost for determining the type of the picture is reduced, the defense capacity and stability of the classification model to attack noise are improved, and the accuracy of the determined type of the picture is improved are achieved.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.