Method for placing detection equipment on display screen based on multilayer convolutional neural networkTechnical Field
The invention relates to a multi-screen interaction technology, in particular to a method for placing detection equipment on a display screen based on a multilayer convolutional neural network.
Background
The multi-screen interaction means that a series of operations such as transmission, analysis, display, control and the like of multimedia (audio, video and picture) contents can be performed on different multimedia terminal devices (such as a common mobile phone and a common television) through wireless network connection, the same contents can be displayed on different terminal devices, and the content intercommunication among all terminals is realized.
The prior art WO2016066079a1 discloses a multi-screen interaction method and system, including acquiring the position of a fixed terminal, and monitoring the position of a mobile terminal; judging whether the position of the mobile terminal is within a set range from the position of the fixed terminal, if so, automatically butting the mobile terminal with the fixed terminal, and performing multi-screen interaction.
In the existing interaction technology, whether intelligent equipment is placed on a display screen or not is detected, sensors such as a gyroscope and a gravity sensor are mainly used, and the method for identifying the intelligent equipment through detecting whether the change of the sensor value meets the characteristics designed in advance is adopted, so that the error identification rate is high under the condition that multiple intelligent equipment are interfered.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a method for placing detection equipment based on a multilayer convolutional neural network on a display screen, wherein the detection equipment has low error recognition rate, high detection precision and high detection speed.
The purpose of the invention can be realized by the following technical scheme:
a method of placing a multi-layer convolutional neural network-based detection device on a display screen, comprising:
placing an intelligent device with an image acquisition device on a display screen in a lighting state, wherein the image acquisition device faces the display screen;
constructing a detection model based on a multilayer convolutional neural network, and loading a weight parameter set;
acquiring an image through an image acquisition device facing to the display screen, inputting the image into a detection model based on a multilayer convolution neural network for reasoning, and giving classification information and a probability value of the image;
and regarding the given classification information and the probability value, if the probability value of the image belonging to the category placed on the display screen is the highest or exceeds a set threshold value, the intelligent device is considered to be placed on the display screen.
Preferably, the constructing a detection model based on a multilayer convolutional neural network, and the loading the set of weight parameters specifically includes:
respectively acquiring images when the images are placed on a display screen and images when the images are not placed on the display screen by using intelligent equipment, and labeling the image category information;
screening the image;
constructing a multilayer convolutional neural network;
and inputting the image into a multilayer convolution neural network, training the multilayer convolution network until convergence, and acquiring a final weight parameter set to obtain a final classification network model.
Preferably, the acquiring of the image placed on the display screen by using the intelligent device specifically includes:
the intelligent equipment is placed in different areas on the display screen, the display content of the display screen continuously changes, exposure and shutter parameters of a camera of the intelligent equipment are changed, backlight brightness and motion compensation parameters of the display screen are changed, the brightness of the environment is changed, sample images of the intelligent equipment placed on the display screen under different conditions are collected, the diversity of samples is ensured, and the samples are marked as classes placed on the display screen.
Preferably, the acquiring, by using the smart device, the image that is not placed on the display screen specifically includes:
the intelligent device is placed on the surface of a common non-luminous object, transparent glass or the surface of the common luminous object, the exposure and shutter parameters of a camera of the intelligent device are changed, the brightness of the environment is changed, sample images of the intelligent device which are not placed on the display screen under different conditions are collected, and the sample images are marked as classes which are not placed on the display screen.
Preferably, the screening of the image is specifically: and judging whether the similarity of the continuously shot images exceeds a set threshold, if so, only keeping the images in the same scene in a set number, and ensuring the balance of the images in various scenes.
Preferably, the building of the multilayer convolutional neural network comprises a convolutional layer, a deep separable convolutional layer, a batch layer, a pooling layer, a global average pooling layer and a full-connection layer.
Preferably, the method specifically comprises the following steps:
s401, selecting an optimization method of a training model;
s402, setting hyper-parameters required by a training model;
s403, according to the characteristics of the training sample, data enhancement is carried out during model training; (ii) a
And S404, selecting a softmax function as the calculation of the final classification probability, wherein the calculation formula of the softmax is as follows:
wherein x is
ijIs the ith sample and the jth output of the last layer of the neural network, C is the number of classes,
is the probability that the ith sample belongs to the jth class;
s405, selecting the difference between the predicted value and the true value of the cross entropy loss function measurement model, wherein the calculation formula of the cross entropy loss function is as follows:
wherein,
for loss value, n is the number of batches, y
ijA true tag indicating whether the ith sample belongs to the jth class, and if so, y
ij1, otherwise y
ijWhere 0, C is the number of categories,
is the probability that the ith sample belongs to the jth class;
step S406, loading a pre-training model;
and S407, training the whole model based on the training samples until convergence to obtain a final classification model.
Preferably, the data enhancement method includes random flipping, random cropping and random tone variation.
Preferably, in the step 3), the image may be transmitted to a cloud server or other devices through a network, and inference is performed by using a multilayer convolutional neural network.
Compared with the prior art, the invention has the following advantages:
1) the error identification rate is low, and the error identification rate is effectively reduced under the condition that interference exists in multiple intelligent devices by identifying through a multilayer convolutional neural network;
2) the detection precision is high, and by collecting different samples at the source, the diversity of the samples is ensured, the training precision of the model is improved, and the detection precision is further improved;
3) the detection speed is high, and the detection precision is ensured and the detection speed is improved by designing a lightweight multilayer convolutional neural network model.
Drawings
FIG. 1 is a diagram of BasicBlock in ashufflentev 2 network structure;
FIG. 2 is a schematic diagram of a DownSampleBlock in a network structure ofshufflentv 2;
FIG. 3 is a flow chart of a neural network model training step;
fig. 4 is a comparison diagram of images displayed on a display captured by a camera of a mobile phone at different distances, wherein S501 is an image captured by the mobile phone at a longer distance from the display, and S502 is an image captured by the mobile phone at a shorter distance when the mobile phone is close to the display.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of the present invention.
The invention discloses a method for placing detection equipment on a display screen based on a multilayer convolutional neural network, which specifically comprises the following steps:
1. the intelligent equipment is provided with an image acquisition device, the display screen is in a lighting state, and when the intelligent equipment is placed on the display screen, the image acquisition device faces the display screen;
2. constructing a detection model based on a multilayer convolutional neural network, and loading a weight parameter set, wherein the steps are as follows:
a) different regions of intelligent equipment on the display screen are placed, the display content of the display screen changes continuously, parameters such as exposure and shutter of a camera of the intelligent equipment are changed, parameters such as backlight brightness and motion compensation of the display screen are changed, and the ambient light brightness is changed: for example, holding a strong light lamp to irradiate a display screen, turning off light, closing an outdoor environment and the like, collecting sample images of the intelligent equipment placed on the display screen under various conditions, ensuring the diversity of samples, and marking the samples as the types placed on the display screen;
b) the intelligent device is not placed on the display screen, does not place on any surface, places ordinary luminous object surface, transparent glass, ordinary illuminant surface etc. changes parameters such as intelligent device camera exposure, shutter, changes ambient light brightness: for example, holding a strong light to irradiate a display screen, turning off light, closing an outdoor environment and the like, acquiring a sample image of intelligent equipment which is not placed on the display screen under various conditions, ensuring the diversity of the sample, and marking the sample as a category which is not placed on the display screen;
c) screening sample pictures: for continuously shot images, judging whether the similarity is high according to experience, if so, only retaining a small number of images under the same scene, ensuring the balance of images of various scenes, reducing the number of sample images and accelerating the training process;
d) a multi-layer neural network is constructed, and the network includes, but is not limited to, convolutional layers (Conv), depth separable convolutional layers (DWConv), batch layers, pooling layers, global average pooling layers, full connection layers, etc., and as shown in table 1, fig. 1, and fig. 2, a typicalnetwork structure shufflentv 2 that can operate in a smart device, where table 1 is a basic structure ofshufflentv 2, fig. 1 is a basic component ofshufflentv 2, and fig. 2 is a downsampling component ofshufflentv 2.
TABLE 1
e) Inputting the image into a multilayer convolutional neural network, setting parameters such as batch processing number, learning rate and training algebra of the network according to the resources of training equipment, and training the multilayer convolutional network until convergence to obtain a better accuracy rate and a final classification network model; the classification model may be trained from scratch directly, or may be trimmed from other data sets, such as ImageNet, from a trained pre-trained model, where fine-tuning from the pre-trained model may speed up the convergence of the model. As shown in fig. 3, taking an example of training a classification model based onshufflentv 2 by a device containing GPU 1080ti, the training steps of the whole model are described as follows:
s401, selecting an optimization method of a training model, such as a batch random gradient descent method with momentum;
and S402, setting hyper-parameters required by the training model, wherein the batch processing number of the training is set to be 32, the initial learning rate is set to be 0.001, and the momentum is set to be 0.9. The training algebra of the model is 100 × N, wherein N is the number of training samples, the weight attenuation is set to be 4e-5, and the learning rate is attenuated by 10 times after every 30 × N training algebras;
and S403, in order to improve the generalization of the model and prevent overfitting, selecting the following data enhancement method during model training according to the characteristics of the training sample: random turning, random cutting, random tone variation and the like;
and S404, selecting a softmax function as the calculation of the final classification probability, wherein the calculation formula of the softmax is as follows:
wherein x is
ijIs the ith sample and the jth output of the last layer of the neural network, C is the number of classes,
is the probability that the ith sample belongs to the jth class;
s405, selecting the difference between the predicted value and the true value of the cross entropy loss function measurement model, wherein the calculation formula of the cross entropy loss function is as follows:
wherein,
for loss value, n is the number of batches, y
ijA true tag indicating whether the ith sample belongs to the jth class, and if so, y
ij1, otherwise y
ijWhere 0, C is the number of categories,
is the probability that the ith sample belongs to the jth class;
step S406, optionally, loading a pre-training model;
and S407, training the whole model based on the training samples until convergence to obtain a final classification model.
3. On the intelligent equipment, an image is acquired through an image acquisition device facing to the direction of a display screen, the image is input into a detection model based on a multilayer convolutional neural network for reasoning, and classification information and probability values of the image are given; optionally, the image can be transmitted to a cloud server or other devices through a network, and inference is carried out by using a multilayer convolution neural network;
4. and regarding the classification information and the probability value given by the detection model, if the probability value of the image belonging to the category placed on the display screen is the highest or exceeds a certain threshold value, the intelligent device is considered to be placed on the display screen.
Fig. 4 is a comparison diagram of images displayed on a display photographed by a camera of a mobile phone at different distances, wherein S501 is an image photographed by the mobile phone at a longer distance from the display, and S502 is an image photographed by the mobile phone at a shorter distance when the mobile phone is close to the display, and it can be seen that S502 has some obvious grid-like distribution characteristics.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.