技术领域technical field
本发明涉及一种视频检测方法,尤其是涉及一种基于深度学习的举手检测方法。The present invention relates to a video detection method, in particular to a hand-raising detection method based on deep learning.
背景技术Background technique
视频序列中的运动人体检测与行为识别是一项涉及计算机视觉、模式识别及人工智能等多领域的研究课题,因其在商业、医疗和军事等领域中广泛的应用价值,一直是人们研究的热点。然而,因为人体行为的多样性和非刚性及视频图像固有的复杂性,所以要提出一种稳健而又实时准确的方法仍然是难点。Human body detection and behavior recognition in video sequences is a research topic involving computer vision, pattern recognition and artificial intelligence. hotspot. However, it is still difficult to come up with a robust and real-time accurate method because of the diversity and non-rigidity of human behaviors and the inherent complexity of video images.
由于噪声和高度动态的背景,不同的光照条件,以及小尺寸和多个可能的匹配对象,在一个典型的课堂环境中检测人的举手动作是一个具有挑战性的任务。Detecting human hand-raising motions in a typical classroom environment is a challenging task due to noisy and highly dynamic backgrounds, varying lighting conditions, as well as small size and multiple possible matching objects.
文献“Haar-Feature Based Gesture Detection of Hand-Raising for MobileRobot in HRI Environments”公开了一种基于Haar特征的举手检测技术,该方法首先训练两个分类器,该方法用人脸检测器扫描输入图像的所有位置以查找人,然后用一个举手检测器扫描人脸周围的特定区域以检测是否有举手。该方法分为训练阶段和检测阶段。训练阶段具体包括:(1)创建样本,训练样本分为正样本和负样本,其中正样本是指待检目标样本,负样本指其它任意图片;(2)特征提取,包括边缘特征、线条特征和中心特征;(3)Cascaded Adaboost训练,通过调用OpenCV的opencv_traincascade程序来完成。训练结束后生成一个.xml模型文件,生成的adaboost级联分类器可以检测举手动作,这也是整个检测技术的关键。检测阶段具体包括:(1)视频切帧并进行人脸检测;(2)基于人脸约束的感兴趣区域选择;(3)利用训练好的级联分类器在感兴趣区域中进行举手检测。The document "Haar-Feature Based Gesture Detection of Hand-Raising for MobileRobot in HRI Environments" discloses a hand-raising detection technology based on Haar features. This method first trains two classifiers. This method uses a face detector to scan the input image. All locations to find people, and then a hand-raised detector scans a specific area around a person's face to detect if a hand is raised. The method is divided into a training phase and a detection phase. The training phase specifically includes: (1) Create samples, and the training samples are divided into positive samples and negative samples, where positive samples refer to target samples to be checked, and negative samples refer to other arbitrary pictures; (2) Feature extraction, including edge features and line features and central features; (3) Cascaded Adaboost training, completed by calling the opencv_traincascade program of OpenCV. After the training, a .xml model file is generated, and the generated adaboost cascade classifier can detect hand gestures, which is also the key to the entire detection technology. The detection stage specifically includes: (1) video frame cutting and face detection; (2) region of interest selection based on face constraints; (3) hand-raising detection in the region of interest using the trained cascade classifier .
上述方法虽然能获得检测结果,但还存在一些不足:(1)需要进行人脸检测,人脸检测的效果好坏将直接影响最终举手检测的效果;(2)感兴趣区域的选择需要不断尝试,对新的检测环境需要重新制定选择方案,似的检测结果不鲁棒;(3)基于Haar特征的举手检测效果不佳,准确率和检全率均较低。Although the above method can obtain detection results, there are still some shortcomings: (1) Face detection is required, and the effect of face detection will directly affect the final effect of hand-raised detection; (2) The selection of the region of interest needs to be continuously Try to make a new selection plan for the new detection environment, and the detection results are not robust; (3) The hand-raising detection effect based on Haar features is not good, and the accuracy rate and recall rate are low.
发明内容Contents of the invention
本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于深度学习的举手检测方法。The purpose of the present invention is to provide a hand-raising detection method based on deep learning in order to overcome the above-mentioned defects in the prior art.
本发明的目的之一是能够检测复杂环境(如教室环境)中的举手动作。One of the objects of the present invention is to be able to detect hand raising in a complex environment such as a classroom environment.
本发明的目的之二是提高举手检测的准确率。The second object of the present invention is to improve the accuracy of hand-raised detection.
本发明的目的之三是提高举手检测的检全率。The third object of the present invention is to improve the recall rate of hand-raised detection.
本发明的目的之四是将不同帧的同一举手动作合并,得到更加真实的举手次数。The fourth purpose of the present invention is to combine the same hand-raising actions in different frames to obtain more realistic times of raising hands.
本发明的目的可以通过以下技术方案来实现:The purpose of the present invention can be achieved through the following technical solutions:
一种基于深度学习的举手检测方法,包括以下步骤:A method for detecting raised hands based on deep learning, comprising the following steps:
1)收集样本,所述样本为复杂环境样本;1) collecting samples, which are complex environmental samples;
2)建立举手检测模型,该举手检测模型基于卷积神经网络结构,并基于所述样本以R-FCN目标检测算法进行训练;2) Set up a hand-raising detection model, which is based on a convolutional neural network structure, and trains with the R-FCN target detection algorithm based on the sample;
3)利用训练后的举手检测模型对待测视频进行举手检测,获得举手框位置。3) Use the trained hand-raising detection model to perform hand-raising detection on the video to be tested, and obtain the position of the hand-raising frame.
进一步地,所述步骤1)中,样本数量大于3万个。Further, in step 1), the number of samples is greater than 30,000.
进一步地,所述步骤1)还包括:保存样本信息,所述样本信息包括视频关键帧图像、关键帧图像信息和关键帧图像信息中举手目标的包围盒坐标。Further, the step 1) also includes: saving the sample information, the sample information includes the video key frame image, the key frame image information and the bounding box coordinates of the target raising the hand in the key frame image information.
进一步地,所述步骤1)还包括:对样本尺寸进行聚类,获得训练过程所需的模板尺寸。Further, the step 1) further includes: performing clustering on the sample size to obtain the required template size for the training process.
进一步地,所述卷积神经网络结构包括中间层次融合层。Further, the convolutional neural network structure includes an intermediate level fusion layer.
进一步地,该方法还包括步骤:Further, the method also includes the steps of:
4)使用跟踪算法对不同帧的同一举手动作进行合并。4) Use the tracking algorithm to merge the same hand-raising action in different frames.
进一步地,所述步骤4)具体为:Further, the step 4) is specifically:
401)获取第一个图像帧及检测到的举手框坐标,各举手框对应建立有一tracklet数组,且状态初始化为ALIVE;401) Obtain the first image frame and the coordinates of the detected hand-raised frame, each hand-raised frame is correspondingly established with a tracklet array, and the state is initialized to ALIVE;
402)获取下一个图像帧,判断是否发生镜头视角变换,若是,则将所有tracklet数组的状态改为DEAD,重新建立新的tracklet数组,返回步骤402),若否,则执行步骤403);402) Obtain the next image frame, judge whether the camera angle of view changes, if so, then change the state of all tracklet arrays into DEAD, re-establish a new tracklet array, return to step 402), if not, then execute step 403);
403)遍历当前图像帧检测到的所有举手框,利用跟踪算法为每一举手框选择最佳匹配的一个tracklet数组;403) Traversing all the hand-raising frames detected by the current image frame, using a tracking algorithm to select a tracklet array of the best match for each hand-raising frame;
404)对于在当前图像帧下未被匹配的tracklet数组,判断其状态是否ALIVE,若是,则状态修改为WAIT,若否,则状态修改为DEAD,返回步骤402),直至处理完成所有图像帧。404) For the unmatched tracklet array under the current image frame, judge whether its state is ALIVE, if so, then the state is modified to WAIT, if not, then the state is modified to DEAD, and returns to step 402), until processing completes all image frames.
进一步地,所述判断是否发生镜头视角变换具体为:Further, the judging whether a camera angle of view transformation occurs is specifically:
获取相邻两个图像帧,统计两个图像帧对应像素点变化率超过第一阈值的像素点个数;判断变化的像素点个数是否大于第二阈值,若是,则判定为发生镜头视角变换,若否,则未发生镜头视角变换。Obtain two adjacent image frames, and count the number of pixels whose corresponding pixel change rate of the two image frames exceeds the first threshold; determine whether the number of changed pixels is greater than the second threshold, and if so, determine that a lens perspective change has occurred , if not, no camera perspective transformation has occurred.
进一步地,该方法还包括步骤:Further, the method also includes the steps of:
5)对检测并合并后的举手动作进行计数。5) Count the detected and merged gestures of raising hands.
与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:
1、本发明采用复杂环境中的视频图像作为样本进行举手检测模型的训练,使得本发明方法能够适用于复杂环境的举手检测,能很好地适应较复杂的背景。1. The present invention uses video images in complex environments as samples to train the hand-raising detection model, so that the method of the present invention can be applied to the hand-raising detection in complex environments, and can well adapt to more complex backgrounds.
2、本发明所提出的举手检测模型是基于大量(大于3万举手样本)样本训练的深度学习模型,模型的准确率高,经过大量的测试,本发明准确率为90%以上。2. The hand-raising detection model proposed by the present invention is a deep learning model based on a large number of (greater than 30,000 hand-raising samples) sample training, and the accuracy rate of the model is high. Through a large number of tests, the accuracy rate of the present invention is more than 90%.
3、本发明训练过程所需要的模板尺寸是基于样本的尺寸聚类获得,而不是人工选择,有效提升了模型的效果。3. The template size required for the training process of the present invention is obtained based on sample size clustering instead of manual selection, which effectively improves the effect of the model.
4、本发明的模板尺寸聚类以及网络中间层次融合保证了模型的检全率,经过大量的测试,本发明检全率为70%以上。4. The size clustering of templates and the fusion of intermediate layers of the network in the present invention ensure the recall rate of the model. After a large number of tests, the recall rate of the present invention is over 70%.
5、本发明使用的跟踪算法能有效跟踪不同帧之间的同一举手动作,因此能得到真实举手次数的数据,为进一步分析评估提供依据。5. The tracking algorithm used in the present invention can effectively track the same hand-raising action between different frames, so the data of the number of real hand-raising times can be obtained, which provides a basis for further analysis and evaluation.
附图说明Description of drawings
图1为本发明的流程示意图;Fig. 1 is a schematic flow sheet of the present invention;
图2为本发明样本尺寸聚类的流程示意图;Fig. 2 is a schematic flow chart of sample size clustering in the present invention;
图3为网络中间层层次融合的示意图;FIG. 3 is a schematic diagram of network middle layer level fusion;
图4为本发明举手检测模型的网络结构示意图;Fig. 4 is a schematic diagram of the network structure of the hand-raised detection model of the present invention;
图5为本发明举手动作的合并流程示意图;Fig. 5 is a schematic diagram of the merging process of raising hands in the present invention;
图6为本发明镜头边界判断流程示意图;FIG. 6 is a schematic diagram of a shot boundary judgment process in the present invention;
图7为实施例中的检测效果图。Fig. 7 is a detection effect diagram in the embodiment.
具体实施方式Detailed ways
下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments. This embodiment is carried out on the premise of the technical solution of the present invention, and detailed implementation and specific operation process are given, but the protection scope of the present invention is not limited to the following embodiments.
如图1所示,本发明提供一种基于深度学习的举手检测方法,包括以下步骤:As shown in Figure 1, the present invention provides a kind of hand-raising detection method based on deep learning, comprises the following steps:
1)收集样本,所述样本为复杂环境样本,样本数量大于3万个。1) Collect samples, the samples are complex environmental samples, and the number of samples is greater than 30,000.
收集样本后需要保存样本信息,包括视频关键帧图像、关键帧图像信息和关键帧图像信息中举手目标的包围盒坐标等。After the samples are collected, the sample information needs to be saved, including video key frame images, key frame image information, and bounding box coordinates of the hand-raising target in the key frame image information.
样本信息的保存可以按照PASCAL VOC数据集的格式制作。PASCAL VOC为图像识别和分类提供了一整套标准化的优秀的数据集,该格式下保存的文件夹包括JPEGImages、Annotations等,其中JPEGImages中存放视频的关键帧图像,Annotations中存放对应图像的详细信息以及图像中举手目标的包围盒坐标,其中举手框位置标记形式由左上角坐标和左下角坐标组成。The preservation of sample information can be made in the format of the PASCAL VOC dataset. PASCAL VOC provides a set of standardized and excellent data sets for image recognition and classification. The folders saved in this format include JPEGImages, Annotations, etc., among which JPEGImages stores the key frame images of the video, and Annotations stores the detailed information of the corresponding images and The bounding box coordinates of the hand-raised target in the image, where the position mark of the hand-raised box consists of the coordinates of the upper left corner and the coordinates of the lower left corner.
模型训练过程中需要用到模板(anchors),本发明中模板大小通过样本尺寸聚类方式获得。在某些实施例中,采用kmeans算法对样本尺寸进行聚类,选出最具有代表性的9种尺寸作为模板。Templates (anchors) are needed in the model training process, and in the present invention, the size of the templates is obtained by clustering the sample size. In some embodiments, the kmeans algorithm is used to cluster the sample sizes, and nine most representative sizes are selected as templates.
k-means中的距离度量公式在这里重新定义为:The distance metric formula in k-means is redefined here as:
d(box,centroid)=1-IOU(box,centroid)d(box,centroid)=1-IOU(box,centroid)
其中,d(box,centroid)表示包围盒box与质点centroid的距离,IOU(box,centroid)表示对应的交叠率。Among them, d(box, centroid) represents the distance between the bounding box box and the particle centroid, and IOU(box, centroid) represents the corresponding overlap rate.
上述公式中,IOU(Intersection over Union)表示模板anchors(即box)与预标记举手框ground truth(即centroid)的交叠率,定义为:In the above formula, IOU (Intersection over Union) represents the overlap rate between the template anchors (ie box) and the pre-marked hand-raising frame ground truth (ie centroid), which is defined as:
如图2所示,聚类的输入具体过程伪代码可描述为:As shown in Figure 2, the pseudocode of the specific process of clustering input can be described as:
Require:输入预标定举手框的bounding boxRequire: Enter the bounding box of the pre-calibrated hand-raising box
Ensure:输出9种最典型的尺寸作为模板尺寸Ensure: Output the 9 most typical sizes as the template size
1:k=91:k=9
2:选择k个点作为初始质心2: Select k points as the initial centroid
3:repeat3: repeat
4:根据距离公式:d(box,centroid)=1-IOU(box,centroid)4: According to the distance formula: d(box, centroid) = 1-IOU(box, centroid)
5:将每个bounding box指派到最近的质心,形成k个簇5: Assign each bounding box to the nearest centroid to form k clusters
6:重新计算每个簇的质心6: Recalculate the centroid of each cluster
7:until簇不发生变化7:until cluster does not change
2)建立举手检测模型,该举手检测模型基于卷积神经网络结构,并基于所述样本以R-FCN目标检测算法进行训练。卷积神经网络结构包括中间层次融合层,以丰富卷积神经网络提取到的特征,进而提高检测的准确率。2) A hand-raising detection model is established, which is based on a convolutional neural network structure, and is trained with the R-FCN target detection algorithm based on the samples. The convolutional neural network structure includes an intermediate level fusion layer to enrich the features extracted by the convolutional neural network, thereby improving the accuracy of detection.
在某些实施例中,卷积神经网络结构使用的修改版的ResNet-101,用C1,C2,C3,C4,C5分别表示ResNet-101的conv1,conv2,conv3,conv4,conv5输出。随着卷积层数的叠加,每一个卷积核的感受野越来越大,学习到的语义特征也越高级,但是一些细微的特征越容易被忽略。而有些环境中举手动作的分辨率会较小,因此为了正确检测小目标,我们将C3与C5的输出相叠加,使网络在C5层学习到的特征同时具有高级语义特征以及低级细节特征。如图3所示,res5c_relu是C5的输出,C5_topdown是C5的上采样层,使C5上采样到与C3一样的大小,最后C5_topdown与C3叠加得到P3层,P3于是代替res5c_relu成为C5的输出,这就丰富了卷积神经网络提取到的特征。In some embodiments, the modified version of ResNet-101 used by the convolutional neural network structure uses C1, C2, C3, C4, and C5 to represent the output of conv1, conv2, conv3, conv4, and conv5 of ResNet-101, respectively. With the superposition of the number of convolutional layers, the receptive field of each convolution kernel is getting bigger and bigger, and the learned semantic features are more advanced, but some subtle features are easier to be ignored. In some environments, the resolution of raising hands will be small, so in order to correctly detect small targets, we superimpose the output of C3 and C5, so that the features learned by the network at the C5 layer have both high-level semantic features and low-level detail features. As shown in Figure 3, res5c_relu is the output of C5, and C5_topdown is the upsampling layer of C5, so that C5 is upsampled to the same size as C3. Finally, C5_topdown and C3 are superimposed to obtain the P3 layer, and P3 replaces res5c_relu as the output of C5. It enriches the features extracted by the convolutional neural network.
特征提取网络采用ResNet-101,并做了网络中间层次的特征图融合后,采用R-FCN目标检测算法进行模型训练。首先使用一组基础的conv+relu+pooling层提取image的feature maps。该feature maps被共享用于后续RPN网络和detection网络。RPN网络用于生成region proposals,该层通过softmax判断anchors属于foreground或者background,再利用bounding box regression修正anchors获得精确的proposals。Roi Pooling层收集输入的feature maps和proposals,综合这些信息后提取proposal feature maps,并计算position-sensitive score maps,然后送入后续detection网络判定目标类别。最后利用proposal feature maps计算proposal的类别,并获得检测框最终的精确位置。The feature extraction network uses ResNet-101, and after the feature map fusion of the middle layer of the network is done, the R-FCN target detection algorithm is used for model training. First, a set of basic conv+relu+pooling layers are used to extract the feature maps of the image. The feature maps are shared for the subsequent RPN network and detection network. The RPN network is used to generate region proposals. This layer judges whether anchors belong to foreground or background through softmax, and then uses bounding box regression to correct anchors to obtain accurate proposals. The Roi Pooling layer collects the input feature maps and proposals, extracts the proposal feature maps after synthesizing these information, and calculates the position-sensitive score maps, and then sends them to the subsequent detection network to determine the target category. Finally, use the proposal feature maps to calculate the category of the proposal, and obtain the final precise position of the detection frame.
ResNet-101包括5个卷积块,共计101层,原版的R-FCN使用前4个卷积块作为RPN网络和detection网络的共享权值网络,第五个卷积块作为detection网络的特征提取网络,本发明把所有的101层都作为RPN网络和detection网络的共享权值网络,第5个卷积块输出的feature map被共享用于RPN网络和detection网络,这样的处理方式在保证准确率的基础上同时也大大减少了计算量。ResNet-101 includes 5 convolutional blocks, with a total of 101 layers. The original R-FCN uses the first 4 convolutional blocks as the shared weight network of the RPN network and the detection network, and the fifth convolutional block as the feature extraction of the detection network. network, the present invention uses all 101 layers as the shared weight network of the RPN network and the detection network, and the feature map output by the fifth convolution block is shared for the RPN network and the detection network. At the same time, the amount of calculation is greatly reduced.
举手检测模型的网络如图4所示。The network of the hand-raising detection model is shown in Fig. 4.
3)利用训练后的举手检测模型对待测视频进行举手检测,获得举手框位置。3) Use the trained hand-raising detection model to perform hand-raising detection on the video to be tested, and obtain the position of the hand-raising frame.
在某些实施例中,该方法还包括步骤:4)根据上一帧的位置,对下一帧的举手动作进行跟踪,使用跟踪算法对不同帧的同一举手动作进行合并。在镜头视角不发生变换的情况下,可以使用跟踪算法对不同帧的同一举手动作进行跟踪。跟踪算法可以采用回溯-剪枝法,为上一帧的举手动作与下一帧的举手动作进行最优匹配。In some embodiments, the method further includes the step: 4) according to the position of the previous frame, track the hand-raising action in the next frame, and use a tracking algorithm to merge the same hand-raising action in different frames. In the case that the angle of view of the camera does not change, the tracking algorithm can be used to track the same hand-raising action in different frames. The tracking algorithm can use the backtracking-pruning method to optimally match the hand-raising action of the previous frame with the hand-raising action of the next frame.
步骤4)具体为:Step 4) is specifically:
401)获取第一个图像帧及检测到的举手框坐标,各举手框对应建立有一tracklet数组,且状态初始化为ALIVE;401) Obtain the first image frame and the coordinates of the detected hand-raised frame, each hand-raised frame is correspondingly established with a tracklet array, and the state is initialized to ALIVE;
402)获取下一个图像帧,判断是否发生镜头视角变换,若是,则将所有tracklet数组的状态改为DEAD,重新建立新的tracklet数组,返回步骤402),若否,则执行步骤403);402) Obtain the next image frame, judge whether the camera angle of view changes, if so, then change the state of all tracklet arrays into DEAD, re-establish a new tracklet array, return to step 402), if not, then execute step 403);
403)遍历当前图像帧检测到的所有举手框,利用回溯剪枝法为每一举手框选择最佳匹配的一个tracklet数组;403) Traversing all the hand-raising frames detected by the current image frame, using the backtracking pruning method to select a tracklet array with the best match for each hand-raising frame;
404)对于在当前图像帧下未被匹配的tracklet数组,判断其状态是否ALIVE,若是,则状态修改为WAIT,若否,则状态修改为DEAD,返回步骤402),直至处理完成所有图像帧。404) For the unmatched tracklet array under the current image frame, judge whether its state is ALIVE, if so, then the state is modified to WAIT, if not, then the state is modified to DEAD, and returns to step 402), until processing completes all image frames.
上述过程的伪代码可概括为:The pseudocode of the above process can be summarized as:
Require:输入N个图像的集合,以及分别检测到的举手框bounding boxRequire: Input a collection of N images, and the bounding box of the raised hand frame detected respectively
Ensure:输出trackletsEnsure: output tracklets
单个图像帧中举手动作的合并过程如图5所示。The merging process of hand-raising actions in a single image frame is shown in Fig. 5.
基于摄像头的视频拍摄存在镜头视角变换的可能,本发明采用帧差法解决该问题,即连续帧相减。如图6所示,判断是否发生镜头视角变换具体为:Camera-based video shooting has the possibility of changing the angle of view of the lens. The present invention uses the frame difference method to solve this problem, that is, the subtraction of consecutive frames. As shown in Figure 6, the specific steps to determine whether a lens viewing angle transformation occurs are as follows:
获取相邻两个图像帧,统计两个图像帧对应像素点变化率超过第一阈值的像素点个数;判断变化的像素点个数是否大于第二阈值,若是,则判定为发生镜头视角变换,若否,则未发生镜头视角变换。Obtain two adjacent image frames, and count the number of pixels whose corresponding pixel change rate of the two image frames exceeds the first threshold; determine whether the number of changed pixels is greater than the second threshold, and if so, determine that a lens perspective change has occurred , if not, no camera perspective transformation has occurred.
具体判断方法是白色部分(即运动部分)占总体像素是否超过了20%,超过即切换。The specific judgment method is whether the white part (that is, the moving part) accounts for more than 20% of the total pixels, and if it exceeds, it will be switched.
基于上述合并过程,该方法还可包括步骤:5)对检测并合并后的举手动作进行计数。Based on the above merging process, the method may further include the step of: 5) counting the detected and merging gestures of raising hands.
实施例1Example 1
本实施例以中小学生教室环境为例说明上述方法。收集4万个样本量,按PASCALVOC数据集的格式制作举手样本。通过样本尺寸的聚类,最终聚类出的9种anchor box尺寸为:In this embodiment, the above method is described by taking the classroom environment of primary and middle school students as an example. Collect 40,000 samples and make hand samples in the format of the PASCALVOC dataset. Through the clustering of the sample size, the final clustered 9 anchor box sizes are:
(37,59)(44,72)(53,80)(56,96)(67,105)(75,128)(91,150)(115,184)(177,283)。(37,59)(44,72)(53,80)(56,96)(67,105)(75,128)(91,150)(115,184)(177,283).
本实施例中的训练过程一共迭代了20000次,得到一个效果较好的举手检测模型。所训练的举手检测模型部分效果图如图7所示。The training process in this embodiment has been iterated a total of 20,000 times, and a hand-raising detection model with better effect is obtained. Partial renderings of the trained hand-raising detection model are shown in Figure 7.
在利用跟踪算法进行不同帧举手动作的合并后,进行数量的统计,记录整个课堂中举手动作的发生次数,完成一个课堂中举手动作的计数,以此评估课堂氛围,为课堂气氛的智能分析提供了依据。After using the tracking algorithm to merge different frames of hand-raising movements, count the number, record the number of hand-raising movements in the entire classroom, and complete the count of hand-raising movements in a classroom, so as to evaluate the classroom atmosphere and provide a basis for the classroom atmosphere. Intelligent analysis provides the basis.
经试验,上述方法的举手检测准确率和检全率较高,准确率90%以上,检全率70%以上。After testing, the above method has a high accuracy rate and recall rate of hand-raising detection, with an accuracy rate of over 90% and a recall rate of over 70%.
以上详细描述了本发明的较佳具体实施例。应当理解,本领域的普通技术人员无需创造性劳动就可以根据本发明的构思作出诸多修改和变化。因此,凡本技术领域中技术人员依本发明的构思在现有技术的基础上通过逻辑分析、推理或者有限的实验可以得到的技术方案,皆应在由权利要求书所确定的保护范围内。The preferred specific embodiments of the present invention have been described in detail above. It should be understood that those skilled in the art can make many modifications and changes according to the concept of the present invention without creative effort. Therefore, all technical solutions that can be obtained by those skilled in the art based on the concept of the present invention through logical analysis, reasoning or limited experiments on the basis of the prior art shall be within the scope of protection defined by the claims.
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711044722.7ACN107808376B (en) | 2017-10-31 | 2017-10-31 | A Deep Learning-Based Hand Raised Detection Method |
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201711044722.7ACN107808376B (en) | 2017-10-31 | 2017-10-31 | A Deep Learning-Based Hand Raised Detection Method |
| Publication Number | Publication Date |
|---|---|
| CN107808376Atrue CN107808376A (en) | 2018-03-16 |
| CN107808376B CN107808376B (en) | 2022-03-11 |
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201711044722.7AExpired - Fee RelatedCN107808376B (en) | 2017-10-31 | 2017-10-31 | A Deep Learning-Based Hand Raised Detection Method |
| Country | Link |
|---|---|
| CN (1) | CN107808376B (en) |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108921748A (en)* | 2018-07-17 | 2018-11-30 | 郑州大学体育学院 | Didactic code method and computer-readable medium based on big data analysis |
| CN109508661A (en)* | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of person's of raising one's hand detection method based on object detection and Attitude estimation |
| CN110163836A (en)* | 2018-11-14 | 2019-08-23 | 宁波大学 | Based on deep learning for the excavator detection method under the inspection of high-altitude |
| CN110399822A (en)* | 2019-07-17 | 2019-11-01 | 思百达物联网科技(北京)有限公司 | Action identification method of raising one's hand, device and storage medium based on deep learning |
| CN110414380A (en)* | 2019-07-10 | 2019-11-05 | 上海交通大学 | A student behavior detection method based on target detection |
| CN110941976A (en)* | 2018-09-24 | 2020-03-31 | 天津大学 | A method of student classroom behavior recognition based on convolutional neural network |
| CN112686128A (en)* | 2020-12-28 | 2021-04-20 | 南京览众智能科技有限公司 | Classroom desk detection method based on machine learning |
| CN116739859A (en)* | 2023-08-15 | 2023-09-12 | 创而新(北京)教育科技有限公司 | Method and system for on-line teaching question-answering interaction |
| CN117670259A (en)* | 2024-01-31 | 2024-03-08 | 天津师范大学 | Sample detection information management method |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104112121A (en)* | 2014-07-01 | 2014-10-22 | 深圳市欢创科技有限公司 | Face identification method, device and interactive game system used for interactive game device |
| CN106651765A (en)* | 2016-12-30 | 2017-05-10 | 深圳市唯特视科技有限公司 | Method for automatically generating thumbnail by use of deep neutral network |
| CN107122736A (en)* | 2017-04-26 | 2017-09-01 | 北京邮电大学 | A kind of human body based on deep learning is towards Forecasting Methodology and device |
| CN107145908A (en)* | 2017-05-08 | 2017-09-08 | 江南大学 | A small target detection method based on R-FCN |
| CN107273828A (en)* | 2017-05-29 | 2017-10-20 | 浙江师范大学 | A kind of guideboard detection method of the full convolutional neural networks based on region |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104112121A (en)* | 2014-07-01 | 2014-10-22 | 深圳市欢创科技有限公司 | Face identification method, device and interactive game system used for interactive game device |
| CN106651765A (en)* | 2016-12-30 | 2017-05-10 | 深圳市唯特视科技有限公司 | Method for automatically generating thumbnail by use of deep neutral network |
| CN107122736A (en)* | 2017-04-26 | 2017-09-01 | 北京邮电大学 | A kind of human body based on deep learning is towards Forecasting Methodology and device |
| CN107145908A (en)* | 2017-05-08 | 2017-09-08 | 江南大学 | A small target detection method based on R-FCN |
| CN107273828A (en)* | 2017-05-29 | 2017-10-20 | 浙江师范大学 | A kind of guideboard detection method of the full convolutional neural networks based on region |
| Title |
|---|
| TIAGO S.NAZAR´ 等: "Hand-Raising Gesture Detection with Lienhart-Maydt Method in Videoconference and Distance Learning", 《SPRING》* |
| 桑农 等: "复杂场景下基于R-FCN的手势识别", 《华中科技大学学报(自然科学版)》* |
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108921748B (en)* | 2018-07-17 | 2022-02-01 | 郑州大学体育学院 | Teaching planning method based on big data analysis and computer readable medium |
| CN108921748A (en)* | 2018-07-17 | 2018-11-30 | 郑州大学体育学院 | Didactic code method and computer-readable medium based on big data analysis |
| CN110941976A (en)* | 2018-09-24 | 2020-03-31 | 天津大学 | A method of student classroom behavior recognition based on convolutional neural network |
| CN109508661A (en)* | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of person's of raising one's hand detection method based on object detection and Attitude estimation |
| CN109508661B (en)* | 2018-10-31 | 2021-07-09 | 上海交通大学 | A Hand Raised Person Detection Method Based on Object Detection and Pose Estimation |
| CN110163836B (en)* | 2018-11-14 | 2021-04-06 | 宁波大学 | Excavator detection method for high-altitude inspection based on deep learning |
| CN110163836A (en)* | 2018-11-14 | 2019-08-23 | 宁波大学 | Based on deep learning for the excavator detection method under the inspection of high-altitude |
| CN110414380A (en)* | 2019-07-10 | 2019-11-05 | 上海交通大学 | A student behavior detection method based on target detection |
| CN110399822A (en)* | 2019-07-17 | 2019-11-01 | 思百达物联网科技(北京)有限公司 | Action identification method of raising one's hand, device and storage medium based on deep learning |
| CN112686128A (en)* | 2020-12-28 | 2021-04-20 | 南京览众智能科技有限公司 | Classroom desk detection method based on machine learning |
| CN116739859A (en)* | 2023-08-15 | 2023-09-12 | 创而新(北京)教育科技有限公司 | Method and system for on-line teaching question-answering interaction |
| CN117670259A (en)* | 2024-01-31 | 2024-03-08 | 天津师范大学 | Sample detection information management method |
| CN117670259B (en)* | 2024-01-31 | 2024-04-19 | 天津师范大学 | A method for managing sample detection information |
| Publication number | Publication date |
|---|---|
| CN107808376B (en) | 2022-03-11 |
| Publication | Publication Date | Title |
|---|---|---|
| CN107808376A (en) | A kind of detection method of raising one's hand based on deep learning | |
| CN108334848B (en) | Tiny face recognition method based on generation countermeasure network | |
| CN107808143B (en) | Computer Vision-Based Dynamic Gesture Recognition Method | |
| CN110598610B (en) | Target significance detection method based on neural selection attention | |
| CN111178197B (en) | Instance Segmentation Method of Cohesive Pigs in Group Breeding Based on Mask R-CNN and Soft-NMS Fusion | |
| CN108062525B (en) | A deep learning hand detection method based on hand region prediction | |
| CN101794384B (en) | Shooting action identification method based on human body skeleton map extraction and grouping motion diagram inquiry | |
| CN107766842B (en) | A gesture recognition method and its application | |
| CN111640125A (en) | Mask R-CNN-based aerial photograph building detection and segmentation method and device | |
| CN109033934A (en) | A kind of floating on water surface object detecting method based on YOLOv2 network | |
| CN107437099A (en) | A kind of specific dress ornament image recognition and detection method based on machine learning | |
| CN107316001A (en) | Small and intensive method for traffic sign detection in a kind of automatic Pilot scene | |
| CN111563452A (en) | A multi-person pose detection and state discrimination method based on instance segmentation | |
| CN111178121B (en) | Pest image positioning and identifying method based on spatial feature and depth feature enhancement technology | |
| CN106897673A (en) | A kind of recognition methods again of the pedestrian based on retinex algorithms and convolutional neural networks | |
| CN112101208A (en) | Feature series fusion gesture recognition method and device for elderly people | |
| CN110929687A (en) | A multi-person behavior recognition system and working method based on key point detection | |
| CN108564120A (en) | Feature Point Extraction Method Based on Deep Neural Network | |
| Liao et al. | A two-stage method for hand-raising gesture recognition in classroom | |
| CN109753962A (en) | Processing method of text region in natural scene image based on hybrid network | |
| CN114565675A (en) | A method for removing dynamic feature points in the front end of visual SLAM | |
| CN114445620A (en) | An Object Segmentation Method to Improve Mask R-CNN | |
| CN113887468B (en) | Single-view human-object interaction identification method of three-stage network framework | |
| CN116563205A (en) | Wheat spike counting detection method based on small target detection and improved YOLOv5 | |
| CN105930793A (en) | Human body detection method based on SAE characteristic visual learning |
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CF01 | Termination of patent right due to non-payment of annual fee | ||
| CF01 | Termination of patent right due to non-payment of annual fee | Granted publication date:20220311 |