Movatterモバイル変換


[0]ホーム

URL:


CN119693840A - A multi-person sports event counting method, system and medium based on multi-key point detection - Google Patents

A multi-person sports event counting method, system and medium based on multi-key point detection
Download PDF

Info

Publication number
CN119693840A
CN119693840ACN202411668090.1ACN202411668090ACN119693840ACN 119693840 ACN119693840 ACN 119693840ACN 202411668090 ACN202411668090 ACN 202411668090ACN 119693840 ACN119693840 ACN 119693840A
Authority
CN
China
Prior art keywords
person
detected
state
key point
counting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202411668090.1A
Other languages
Chinese (zh)
Inventor
高铁杠
王昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai UniversityfiledCriticalNankai University
Priority to CN202411668090.1ApriorityCriticalpatent/CN119693840A/en
Publication of CN119693840ApublicationCriticalpatent/CN119693840A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

Translated fromChinese

本发明公开了一种基于多关键点检测的多人体育项目计数方法、系统及介质,本发明基于多关键点检测的多人体育项目计数方法包括通过摄像头实时获取检测场地的视频流数据,通过对视频流中视频帧的预处理得到最终用于关键点检测的视频帧,检测视频帧中待检测人员的边界框位置并截取用于关键点检测的图像区域,同时追踪视频帧中的人员,检测人体中共33个关键点的坐标值,针对不同的体育项目完成相应项目的计数。本发明能够在不同分辨率,不同光照强度,不同场地背景下准确的识别出待检测人员的各个关键点信息,并准确的完成相应项目的计数工作,具有准确度高,泛化性好,拓展性强的优点,可广泛应用于体育运动相关领域的计数、计时等工作。

The present invention discloses a multi-person sports item counting method, system and medium based on multi-key point detection. The multi-person sports item counting method based on multi-key point detection of the present invention comprises obtaining video stream data of a detection site in real time through a camera, obtaining a video frame for key point detection by preprocessing a video frame in the video stream, detecting the position of a bounding box of a person to be detected in the video frame and intercepting an image area for key point detection, tracking the person in the video frame at the same time, detecting the coordinate values of 33 key points in the human body, and completing the counting of corresponding items for different sports items. The present invention can accurately identify the key point information of the person to be detected under different resolutions, different light intensities and different venue backgrounds, and accurately complete the counting work of the corresponding items, has the advantages of high accuracy, good generalization and strong expansibility, and can be widely used in counting, timing and other work in sports related fields.

Description

Multi-key-point detection-based multi-human-body breeding item counting method, system and medium
Technical Field
The invention relates to the field of artificial intelligence, in particular to a multi-human-body breeding item counting method, equipment and medium based on multi-key-point detection.
Background
Sports counting (timing) has wide-ranging applications in a variety of contexts, such as events, sports exams, etc. The existing counting method is divided into three types, namely a traditional manual counting mode which is time-consuming and labor-consuming and can be influenced by subjective judgment of counting staff, and a physical device using a plurality of sensors for assisting counting, so that labor cost can be greatly saved, a large number of physical devices still need to be purchased, cost is high, and finally, a pure vision scheme based on a deep neural network is adopted, and the counting scheme with high accuracy and strong robustness can be realized only by the physical device and the computing device which can record and detect site videos.
The current counting scheme based on computer vision has certain problems, such as that most of the counting schemes only support single person detection, or fewer key points of detected human bodies, and cannot meet the requirement of project counting. For human body key point detection, two main methods are adopted, namely a top-down mode is adopted, namely, a binding box of a human body is detected firstly, then key point detection is carried out on people in a detection frame, the mode can realize the key point detection of multiple people, but the number of key points detected by the existing scheme is small, for example, 17 key points are only detected by YOLO-POSE, the requirement of a fine algorithm cannot be met, the model structure of the model structure is weak for a small object, the task of the key point detection is difficult to be met, and the other mode is adopted from bottom to top, namely, the key points of the people in an image are directly identified, the key points with the highest possibility are selected to form a human body structure, the mode can realize the key point detection of a plurality of people, but the detection of multiple people cannot be realized, and the identity information of each person to be detected needs to be determined, namely, each person should have unique and unchanged ID, so that the accurate multiple people detection and counting are difficult to be realized by the key point detection algorithm alone.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a multi-human-body sports counting method, a multi-human-body sports counting system and a multi-human-body sports counting medium based on multi-key-point detection.
In order to solve the technical problems, the invention adopts the following technical scheme:
A multi-human body breeding item counting method based on multi-key point detection comprises the following implementation steps:
1) Video stream data of a venue is acquired.
2) Preprocessing video frames in a video stream to obtain video frames finally used for key point detection, wherein the preprocessing of the video frames comprises sampling the video frames, pre-storing the video frames and possible scaling operation of the video frames.
3) Detecting the boundary frame position of a person to be detected in a video frame, intercepting an image area for key point detection, wherein a target detection model based on YOLO V8 is used for detecting the boundary frame position, the model has a high reasoning speed and can meet the requirement of real-time detection, the area of the identified person to be detected is intercepted for subsequent key point detection on one hand, and is required for subsequent target tracking tasks on the other hand, after target detection is completed, the person to be detected is required to be tracked, a DeepSort algorithm is used for the part, and a unique and unchanged ID is allocated to each person to be detected, so that subsequent counting operation is convenient.
4) Detecting coordinate values of 33 key points in a human body, wherein the coordinate values comprise 11 heads, 10 arms (two sides), 4 trunk parts, 2 legs (two sides) and 6 feet (two sides), and a mediapipe gesture detection module is used;
5) Counting corresponding items is completed for different sports items, taking sit-up item counting as an example, defining four states of a person to be detected, namely a ready state, a sit-up state and an end state, respectively judging the transition between the states by using corresponding rules, and completing sit-up action when the person to be detected completes the state transition from the sit-up state to the sit-up state completely.
A multiple-person item-in-play counting system based on multiple-keypoint detection, comprising a computer device programmed or configured to perform the steps of the multiple-person item-in-play counting method based on multiple-keypoint detection as described in any one of the above, or a computer program programmed or configured to perform the multiple-person item-in-play counting method based on multiple-keypoint detection as described in any one of the above is stored on a storage medium of the computer device.
A computer readable storage medium having stored thereon a computer program programmed or configured to perform the multiple sports item counting method based on multiple keypoint detection of any of the above.
Compared with the prior art, the method has the advantages that preprocessing is carried out on the video stream, the problem of video stream cache overflow when the processing speed is insufficient is prevented, multi-user multi-key-point detection is realized, the accuracy and the accuracy of a counting algorithm are improved while the key-point detection efficiency is considered, meanwhile, due to the fact that DeepSort is used for target tracking, the situation of tracking failure caused by instantaneous shielding can be effectively avoided, and counting work of sports of multiple users can be accurately completed under different scenes of background, illumination intensity and the number of people.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a basic flow of a tracking algorithm according to an embodiment of the present invention.
FIG. 3 is a network architecture diagram of DeepSort algorithm feature extraction network.
Fig. 4 is a key point labeling diagram for detecting key points of a human body according to an embodiment of the present invention.
Fig. 5 is a state transition diagram of a sit-up counting algorithm according to an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the implementation steps of the multi-body-play counting method based on multi-key point detection in this embodiment include:
1. Video stream data of a venue is acquired, which is typically acquired by an image capturing apparatus supporting video streaming by wired or wireless means, typically an image capturing apparatus supporting RTSP, RTMP or encoded video streaming. Meanwhile, the method also supports the local camera equipment, and then carries out corresponding counting (timing) on the stored video to be detected.
2. Preprocessing video frames in a video stream to obtain video frames finally used for key point detection, wherein the preprocessing comprises the following detailed steps of
2.1 Video frames in the original video stream are uniformly sampled, and for video streams above 30FPS, the frame rate after sampling is 20 frames per second.
2.2 A data structure called FramePool is created, the size of which is determined according to hardware conditions such as memory of a specific computing device, the video frames sampled in 2.1) are preferentially stored in FramePool, when FramePool capacity is insufficient and new video frames arrive, the video frames which enter first in FramePool are preferentially deleted, and the newly arrived video frames are stored in FramePool. Meanwhile, the video frames which enter FramePool first are taken from FramePool for processing in the subsequent process, and the processes are all asynchronous operation and have corresponding locking mechanisms to ensure the accuracy of FramePool access.
2.3 For video frames sampled from FramePool, if their resolution is higher than 1280 x 720, they are scaled to 1280 x 720 for convenient subsequent processing. After the preprocessing, the problem of video stream cache overflow caused by insufficient computing power of the computing equipment is effectively avoided.
3. Detecting the boundary frame position of a person to be detected in a video frame, intercepting an image area for key point detection, and simultaneously carrying out target tracking on the person to be detected, wherein the specific implementation steps are as follows:
3.1 The method comprises the steps of 1) detecting a person to be detected in a video frame by using a target detection model based on YOLO V8, wherein the model BackBone adopts a C2f module as a basic structural unit, and particularly comprises 5 convolution modules and 4C 2f modules, compared with a C3 module of the previous generation, the model has fewer parameters and more excellent feature extraction capacity, and meets the requirement of real-time detection on video stream data, the model Neck adopts a multi-scale feature fusion technology to fuse feature graphs from different stages of a back to enhance feature representation capacity, the part plays an important role in feature extraction and feature fusion, and a Head part is responsible for final target detection and classification tasks, and comprises a detection Head and a classification Head. The detection head comprises a series of convolution layers and deconvolution layers for generating detection results, and the classification head adopts global average pooling to classify each feature map. Cutting out the region to be detected of the key points in the image, namely the region containing the personnel to be detected, according to the obtained binding box information.
3.2 According to the marking box information obtained in 3.1) and the intercepted key point to-be-detected area image, carrying out target tracking on the to-be-detected person in the video frame by using DeepSort algorithm. The specific implementation steps are shown in fig. 2, the feature extraction is carried out on the image intercepted in 3.1), the feature is used for cascade matching subsequently, wherein the structure of the feature extraction network is shown in fig. 3, a front convolution layer is composed of 2D convolution and BN layers, the front convolution layer is processed by a convolution layer composed of basic blocks after ReLU and maximum pooling operation, the basic blocks are composed of 2D convolution layers connected with BN, an activation function uses ReLU, in particular, the basic blocks provide optional downsampling operation, a backbone network is composed of 8 basic blocks and an average pooling layer, each two basic blocks are in a group, and if the group of basic blocks are selected to be downsampled, only the first basic block in the group is downsampled, namely, 3 rd, 5 th and 7 th basic blocks in the current structure are downsampled. The classifier consists of a linear layer and a BN layer, and a final classified linear layer is linked after the ReLU activation and the Dropout operation to obtain a result, and the extracted features can be used for subsequent cascade matching.
For tracks of the bounding box, namely trace, two states, namely a confirmed state and a non-confirmed state, the non-confirmed state track can be converted into the confirmed state track under a certain condition, different processing rules are provided for the non-confirmed state track and the confirmed state track in the subsequent matching process, and each existing track predicts the subsequent track through Kalman filtering (hereinafter referred to as KF for short).
For a frame of a character boundary box detected for the first time, initializing the boundary box into an original track, wherein all tracks are in a non-confirmation state when being generated, for the tracks in the non-confirmation state, performing IOU matching on a KF predicted track of the tracks with a detection frame of a subsequent video frame, generating three matching results through a Hungary algorithm, deleting the track if the tracks are not successfully matched (the tracks can be deleted directly because the tracks are in the non-confirmation state), initializing the detection frame into a new track if the detection frame is not successfully matched, and updating the track according to the position of the detection frame if the tracks are successfully matched.
If a trace matches a detected bounding box a number of times in succession, the trace transitions to the validated state, the number of times in succession being a variable threshold, typically set to 3. For the confirmation state track, the KF predicted track is subjected to cascade matching with the detection frame in the subsequent video frame, the cascade matching uses the characteristics extracted by using the deep neural network, and the matching has two results, if the detection frame is not matched or the track is not matched, the subsequent IOU matching is performed, and if the matching is successful, the track information is updated according to the position of the detection frame. For a validated track, if the IOU fails to match, the track is not deleted directly, but is deleted after a number of consecutive failures, which is a variable threshold, typically set to 30.
4. The coordinate values of 33 key points in the human body are detected, see fig. 4, which comprises 11 heads, 10 arms (two sides), 4 trunk, 2 legs (two sides) and 6 feet (two sides), wherein a specific key point parameter is 0-nose、1-left eye(inner)、2-left eye、3-left eye(outer)、4-right eye(inner)、5-right eye、6-right eye(outer)、7-left ear、8-right ear、9-mouth(left)、10-mouth(right)、11-left shoulder、12-right shoulder、13-left elbow、14-right elbow、15-left wrist、16-right wrist、17-leftpinky、18-rightpinky、19-left index、20-right index、21-leftthumb、22-right thumb、23-lefthip、24-righthip、25-left knee、26-right knee、27-left ankle、28-right ankle、29-leftheel、30-right heel、31-left foot index、32-right foot index.. The key point detection model uses a POSE module of mediapipe, and compared with a model of mainstream YOLO-POSE and the like, the key point detection model has the advantages of more key points identified by the module, higher identification speed and unique advantages for subdividing key points of the feet of the human body when the key point detection model is used for processing items requiring fine foot judgment. The method has the defects that as a bottom-up model architecture is used, namely, all possible human body key points in an image are identified, and a plurality of key points with highest confidence level are screened to form a human skeleton structure, the model only supports the detection of single human body key points, and when a plurality of persons to be detected exist in the image, the model only outputs one 'most credible' key point of the person to be detected. Therefore, aiming at the problems, the method adopts a mode of combining the model with target detection and target tracking, and realizes multi-key-point and rapid multi-person bone key-point identification.
5. For different sports items, the corresponding items are counted through a corresponding counting algorithm, and the sit-up item counting is taken as an example, referring to fig. 5, the specific steps of the embodiment are as follows:
5.1 The states of the personnel to be detected are defined, and the states are respectively a ready state, a supine state, a sitting state and an ending state, and the specific definition of each state is as follows. The method comprises the steps of preparing a person, namely, the person is in a picture and is detected by a target detection model, but does not start sit-up, lying in a test position, wherein the person is in a supine state, the trunk is horizontal, the scapula touches the ground, the legs are naturally bent, the hands are placed on two sides of the body and are close to the ground, the soles touch the ground, the person is in a sitting state, the person is detected to lift the upper body to a certain angle by means of abdominal force, fingers are simultaneously moved forward to a standard line position, feet and buttocks are forbidden to leave the ground in the process, and the person to be detected enters an ending state after standing from the supine state or the sitting state, and count judgment is not performed after the person to be detected enters the ending state.
5.2 After the personnel to be detected appear in the video picture and are detected by the target detection model, the tracking algorithm gives the personnel a unique ID, the personnel enter the preparation state, and the key points of the human bones of the personnel to be detected are identified at the moment.
5.3 For the detected key points, confidence level screening is carried out before subsequent use, the confidence level is smaller than a set threshold value, the data with the key points on both sides is not taken, and if the key points on both sides are larger than the set threshold value, the average value is taken, so that misjudgment caused by errors of a key point detection model is reduced.
5.4 The method comprises the steps of determining that a person to be detected is in a horizontal lying state rather than an upright state, wherein the difference of x-direction coordinates of head key points and foot key points is larger than the difference of y-direction coordinates, the difference of x-direction coordinates of hip key points and foot key points is larger than the difference of y-direction coordinates, the head key points use nose position key point coordinates, the foot key points use three key points of feet, namely ankle, heel and toe coordinate average values, meanwhile, the person needs to be guaranteed to lie on the ground horizontally, the scapula is close to the ground, namely, the angle formed by the head key points, the hip key points and the three key points of the foot is required to be close to 180 degrees, the legs are naturally bent, the hip key points are required to be bent, the angle formed by the three key points of the foot is required to be close to 90 degrees, and the foot contacts the ground, namely, the toe is met simultaneously under the requirement of horizontal lying, and the angle between the heel and the three key points of the hip is close to 180 degrees. Wherein, the method for calculating the angle according to the coordinates of the three points is as follows
Note that where x2, y2 are the angled vertex keypoint coordinates.
5.5 And (3) sitting and starting to judge whether the person enters the sitting and starting state after the person to be detected enters the supine state. Firstly detecting whether the hip lifting, the foot lifting and other phenomena occur in a frame in the sitting process, if so, performing counting, recording the hip and foot key point coordinates of the person when the person enters a supine state, judging whether the error occurs according to the hip and foot y direction coordinate change of the person in the sitting process, wherein the foot lifting comprises foot lifting and foot lifting, for the foot lifting and the foot lifting, judging whether the two errors occur simultaneously only by comparing the average value of the key points of the three key points of the foot, namely the ankle, the heel and the toe with the average value of the initially recorded foot key points, if not, detecting the rising angle of the person according to the angle formed by the key points of the shoulder, the hip and the foot, judging the rising angle of the person according to the angle formed by the key points of the shoulder, the hip and the foot, and judging whether the person meets the requirement when the angle meets a certain threshold value, normally setting 150 degrees, namely the upper body and the ground of the person to be detected forms an angle of more than 30 degrees, finally detecting whether the hand moves to a standard line, and judging whether the hand passes through the line according to the finger coordinate and the standard line position.
5.6 Ending state judgment, namely judging that the current state of the personnel to be detected is in a supine state or a sitting state, and the current personnel no longer meet the horizontal lying state described in 5.4), namely, if the personnel to be detected is up, the current personnel enters the ending state, and counting judgment is not performed after the current personnel enter the ending state.
5.7 The person to be detected enters the sit-up state from the supine state and then enters the supine state, so that sit-up actions of the person are considered to be completed, the sit-up count of the person is correspondingly increased, and the person to be detected can be obtained by changing the preparation state and the sit-up state, the sit-up state can only be obtained by changing the supine state, and the end state can be obtained by changing the supine state and the sit-up state.
Finally, it is noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and that other modifications and equivalents of the technical solution of the present invention can be made by those skilled in the art without departing from the spirit and scope of the technical solution of the present invention, and the scope of the claims of the present invention shall be covered.

Claims (9)

Translated fromChinese
1.一种基于多关键点检测的多人体育项目计数方法,其特征在于,包括:1. A method for counting multi-person sports events based on multi-key point detection, characterized by comprising:1)获取场地的视频流数据;1) Obtain the video stream data of the venue;2)对视频流中视频帧进行预处理得到最终用于关键点检测的视频帧;2) Preprocessing the video frames in the video stream to obtain the final video frames for key point detection;3)检测视频帧中待检测人员的边界框位置并截取用于关键点检测的图像区域,同时对待检测人员进行追踪;3) Detect the bounding box position of the person to be detected in the video frame and capture the image area for key point detection, while tracking the person to be detected;4)检测人体中共33个关键点的坐标值,包括头部11个,手臂(两侧)10个,躯干4个,腿部(两侧)2个,脚部(两侧)6个;4) Detect the coordinate values of 33 key points on the human body, including 11 on the head, 10 on the arms (on both sides), 4 on the torso, 2 on the legs (on both sides), and 6 on the feet (on both sides);5)针对不同的体育项目完成相应项目的计数。5) Complete the counting of corresponding items for different sports.2.根据权利要求1所述的基于多关键点检测的多人体育项目计数方法,其特征在于,步骤2)包括:2. The method for counting multi-person sports events based on multi-key point detection according to claim 1, characterized in that step 2) comprises:2.1)对视频帧进行均匀采样,采样步长根据实际计算设备及原始视频流帧率进行设定,通常对于帧率大于等于30的视频流其采样后的视频帧为每秒20帧;2.1) The video frames are uniformly sampled. The sampling step is set according to the actual computing device and the frame rate of the original video stream. Usually, for a video stream with a frame rate greater than or equal to 30, the sampled video frames are 20 frames per second;2.2)设立一FramePool,其作用是存储采样后的视频帧,根据实际计算设备设置FramePool的大小,当其存储满且有新的帧进入时,丢弃最先进入FramePool的帧;2.2) Establish a FramePool, which is used to store sampled video frames. Set the size of the FramePool according to the actual computing device. When the storage is full and new frames enter, discard the first frame that enters the FramePool;2.3)处理线程会不断从FramePool中获取其中最早进入的帧,将其进行resize操作后进行后续的处理,具体为,分辨率大于1280×720的视频帧将被放缩为1280×720,分辨率过小的将不做处理,经过该预处理后,有效避免由于计算设备算力不足导致的视频流缓存溢出的问题。2.3) The processing thread will continuously obtain the earliest frames from the FramePool, resize them, and then perform subsequent processing. Specifically, video frames with a resolution greater than 1280×720 will be scaled to 1280×720, and frames with a resolution too small will not be processed. After this preprocessing, the problem of video stream cache overflow caused by insufficient computing power of the computing device can be effectively avoided.3.根据权利要求1所述的基于多关键点检测的多人体育项目计数方法,其特征在于,步骤3)包括,使用预训练好的目标检测模型对视频帧进行检测以确定待检测人员位置,随后使用追踪算法对检测到的人员进行目标追踪,即确定视频中每一个待检测人员的boundingbox和其独有的ID,其中其独有的ID是本系统进行分配的,仅为目标追踪使用。3. According to the multi-person sports counting method based on multi-key point detection in claim 1, it is characterized in that step 3) includes using a pre-trained target detection model to detect the video frame to determine the position of the person to be detected, and then using a tracking algorithm to track the detected person, that is, determining the bounding box and unique ID of each person to be detected in the video, wherein the unique ID is assigned by this system and is only used for target tracking.4.根据权利要求3所述的基于多关键点检测的多人体育项目计数方法,其特征在于,预训练好的目标检测模型是使用基于YOLO V8的深度卷积神经网络模型对权利要求2的步骤2.3)中获取到的视频帧进行目标检测,该模型的BackBone采用C2f模块作为基本构成单元,相比于上一代的C3模块,具有更少的参数量和更优秀的特征提取能力,更符合对视频流数据进行实时检测的需求;其Neck采用了多尺度特征融合技术,将来自Backbone的不同阶段的特征图进行融合,以增强特征表示能力;而Head部分负责最终的目标检测和分类任务,包括一个检测头和一个分类头;检测头包含一系列卷积层和反卷积层,用于生成检测结果;分类头则采用全局平均池化来对每个特征图进行分类;通过得到的bounding box信息,对图像中的关键点待检测区域、即包含有待检测人员的区域进行裁剪。4. According to the multi-player sports counting method based on multi-key point detection described in claim 3, it is characterized in that the pre-trained target detection model uses a deep convolutional neural network model based on YOLO V8 to perform target detection on the video frames obtained in step 2.3) of claim 2, and the BackBone of the model adopts the C2f module as the basic constituent unit. Compared with the previous generation C3 module, it has fewer parameters and better feature extraction capabilities, and is more in line with the needs of real-time detection of video stream data; its Neck adopts multi-scale feature fusion technology to fuse feature maps from different stages of Backbone to enhance feature representation capabilities; and the Head part is responsible for the final target detection and classification tasks, including a detection head and a classification head; the detection head includes a series of convolutional layers and deconvolution layers for generating detection results; the classification head uses global average pooling to classify each feature map; through the obtained bounding box information, the key point to be detected area in the image, that is, the area containing the person to be detected, is cropped.5.根据权利要求3所述的基于多关键点检测的多人体育项目计数方法,其特征在于,对待检测人员进行追踪,使用DeepSort算法,步骤包括5. The method for counting multiple sports events based on multi-key point detection according to claim 3 is characterized in that the personnel to be detected are tracked and the DeepSort algorithm is used, and the steps include:s1)对权利要求4中截取到的图像进行特征提取,该特征后续会用于级联匹配;s1) extracting features from the image captured in claim 4, which will be used for cascade matching later;s2)处理的第一个包含有若干人物检测框的帧中的人物检测框设定为初始轨迹;s2) setting the person detection frame in the first processed frame containing a plurality of person detection frames as the initial trajectory;s3)使用卡尔曼滤波预测人物轨迹,该轨迹分为确认态与非确认态,对于非确认态的轨迹框,将预测的人物bounding box与后续检测到的ground truth进行IOU匹配,通过匈牙利算法得到线性匹配结果,如果预测轨迹未匹配,则删除该轨迹,如果检测到的框未匹配,则生成一个新的轨迹,说明检测到了新的人员,如果匹配,则更新轨迹信息;对于确认态的轨迹,与检测到的bounding box进行级联匹配,该步骤中会同时使用s1)中提取到的外观特征和轨迹信息,如果匹配成功,则更新轨迹,匹配失败的轨迹会继续进行后续的IOU匹配;s3) Use Kalman filtering to predict the trajectory of the person. The trajectory is divided into confirmed and unconfirmed states. For the unconfirmed trajectory box, the predicted person bounding box is matched with the subsequently detected ground truth by IOU, and the linear matching result is obtained by the Hungarian algorithm. If the predicted trajectory does not match, the trajectory is deleted. If the detected box does not match, a new trajectory is generated, indicating that a new person is detected. If it matches, the trajectory information is updated. For the confirmed trajectory, cascade matching is performed with the detected bounding box. In this step, the appearance features and trajectory information extracted in s1) are used at the same time. If the match is successful, the trajectory is updated. The trajectory that fails to match will continue to perform subsequent IOU matching.s4)对于非确认态的轨迹,如果其连续若干次成功匹配,则其可以转化为确认态。s4) For a trajectory in an unconfirmed state, if it is successfully matched several times in a row, it can be converted to a confirmed state.s5)对于确认态轨迹,如果其IOU匹配失败次数大于设定的阈值,则删除该轨迹,否则,将该轨迹放回轨迹集合中,等待下一次匹配。s5) For a confirmed trajectory, if the number of IOU matching failures is greater than the set threshold, the trajectory is deleted; otherwise, the trajectory is put back into the trajectory set and waits for the next match.6.根据权利要求1所述的基于多关键点检测的多人体育项目计数方法,其特征在于,检测人体中共33个关键点的坐标值,使用mediapipe的姿态检测模块,具体检测的关键点为头部11个,包含眼睛,鼻子,嘴巴,耳朵;胳膊(两侧)10个,包含肘部,手腕,手指;躯干4个,包含肩膀和髋部;腿部(两侧)2个,包含膝盖;脚部(两侧)6个,包含脚踝,脚后跟,脚趾,获取的坐标为二维坐标以及相应的可见度信息,即单一关键点坐标格式为[x,y,visibility]。6. The multi-person sports counting method based on multi-key point detection according to claim 1 is characterized in that the coordinate values of 33 key points in the human body are detected, and the posture detection module of mediapipe is used. The key points specifically detected are 11 on the head, including eyes, nose, mouth, and ears; 10 on the arms (on both sides), including elbows, wrists, and fingers; 4 on the torso, including shoulders and hips; 2 on the legs (on both sides), including knees; 6 on the feet (on both sides), including ankles, heels, and toes. The acquired coordinates are two-dimensional coordinates and corresponding visibility information, that is, the coordinate format of a single key point is [x, y, visibility].7.根据权利要求1所述的基于多关键点检测的多人体育项目计数方法,其特征为,针对仰卧起坐项目,步骤如下:7. The method for counting multi-person sports events based on multi-key point detection according to claim 1 is characterized in that, for the sit-up event, the steps are as follows:c1)定义待检测人员状态,准备态:人员出现在画面中,但未开始进行仰卧起坐运动;仰卧态:人员躺卧在测试位置,躯干呈水平,肩胛骨触及地面,腿部自然弯曲,手部放置在身体两侧贴近地面,脚掌触及地面;坐起态:检测人员依靠腹部力量抬起上身至一定角度,同时手指前移至标准线位置,过程中脚部,臀部禁止离开地面;结束态:当待检测人员由仰卧态或坐起态站立后,进入结束状态,进入结束状态后不再进行计数判定;c1) Define the status of the person to be tested: Ready state: the person appears in the screen but has not started to do sit-ups; Supine state: the person lies in the test position, with the torso horizontal, shoulder blades touching the ground, legs naturally bent, hands placed on both sides of the body close to the ground, and soles of the feet touching the ground; Sit-up state: the tester lifts the upper body to a certain angle with the strength of the abdomen, and the fingers move forward to the standard line position. During the process, the feet and buttocks are prohibited from leaving the ground; End state: when the person to be tested stands up from the supine state or the sit-up state, it enters the end state, and no counting judgment is performed after entering the end state;c2)准备态判定:待检测人员出现在视频画面中并被权利要求3所述目标检测模型检测到后,由追踪算法赋予人员一唯一ID,人员进入准备态;c2) Ready state determination: after the person to be detected appears in the video screen and is detected by the target detection model described in claim 3, the tracking algorithm assigns the person a unique ID, and the person enters the ready state;c3)仰卧态判定:待检测人员应当处于水平躺卧状态而非直立状态,此时头部关键点与脚部关键点的x方向坐标差应当大于y方向坐标差,且髋部关键点与脚部关键点的x方向坐标差应当大于y方向坐标差;同时,需保证人物水平躺卧在地面上,且肩胛骨贴近地面,即要求头部,髋部,脚部三处关键点形成的角度接近180度;腿部自然弯曲,要求髋部,膝盖,脚部三处关键点形成的角度接近90度;脚部接触地面,即在满足水平躺卧的要求下同时满足脚趾,脚后跟与髋部三处关键点角度接近180度;c3) Supine state determination: The person to be tested should be in a horizontal lying state rather than an upright state. At this time, the x-direction coordinate difference between the key points of the head and the foot should be greater than the y-direction coordinate difference, and the x-direction coordinate difference between the key points of the hip and the foot should be greater than the y-direction coordinate difference; at the same time, it is necessary to ensure that the person lies horizontally on the ground with the shoulder blade close to the ground, that is, the angle formed by the three key points of the head, hip and foot is close to 180 degrees; the legs are naturally bent, and the angle formed by the three key points of the hip, knee and foot is close to 90 degrees; the feet touch the ground, that is, the angles of the three key points of the toes, heels and hips are close to 180 degrees while meeting the requirements of lying horizontally;c4)坐起态判定:待检测人员进入仰卧态后,开始判定是否进入坐起态,首先检测坐起过程中的帧是否出现臀部抬起,脚部抬起等现象,如果出现,则此次计数作废,在人物进入仰卧态时,会记录人物髋部,脚部关键点坐标,根据起身过程中人物髋部,脚部y方向坐标变化判断是否有上述错误情况发生;如无上述错误发生,检测人物起身角度,根据肩部,髋部,脚部三处关键点形成的角度判断人物的起身角度,当该角度满足某一阈值时认为已经符合要求;最后检测手部是否移动至标准线,根据手指坐标与标准线坐标位置判定手部是否已经过线;c4) Sitting up state judgment: After the person to be tested enters the supine state, it starts to judge whether he/she enters the sitting up state. First, it is detected whether the frame in the sitting up process shows the phenomenon of hip lifting, foot lifting, etc. If so, the count will be invalidated. When the person enters the supine state, the coordinates of the key points of the person's hip and foot will be recorded. According to the change of the y-direction coordinates of the person's hip and foot during the standing up process, it is judged whether the above-mentioned errors occur; if no such errors occur, the person's standing up angle is detected, and the standing up angle of the person is judged according to the angle formed by the three key points of the shoulder, hip and foot. When the angle meets a certain threshold, it is considered that the requirements are met; finally, it is detected whether the hand moves to the standard line, and it is judged whether the hand has crossed the line according to the coordinates of the finger and the coordinates of the standard line;c5)结束态判定:待检测人员当前状态为仰卧态或坐起态,且当前人员不再满足c3)中所述的水平躺卧状态,则当前人员进入结束态,进入结束态后不再进行计数判定;c5) End state determination: If the current state of the person to be detected is in a supine state or a sitting state, and the current person no longer satisfies the horizontal lying state described in c3), the current person enters the end state, and no counting determination is performed after entering the end state;c6)待检测人员由仰卧态进入坐起态再进入仰卧态视为完成了一次仰卧起坐动作,该人员的仰卧起坐计数相应增加。c6) When the person being tested moves from the supine position to the sit-up position and then back to the supine position, it is deemed that he has completed one sit-up action, and the person's sit-up count is increased accordingly.8.一种基于多关键点检测的多人体育项目计数系统,包括计算机设备,其特征在于,该计算机设备被编程或配置以执行权利要求1~7中任意一项所述基于多关键点检测的多人体育项目计数方法的步骤,或该计算机设备的存储介质上存储有被编程或配置以执行权利要求1~7中任意一项所述基于多关键点检测的多人体育项目计数方法的计算机程序。8. A multi-person sports counting system based on multi-key point detection, comprising a computer device, characterized in that the computer device is programmed or configured to execute the steps of the multi-person sports counting method based on multi-key point detection as described in any one of claims 1 to 7, or the storage medium of the computer device stores a computer program that is programmed or configured to execute the multi-person sports counting method based on multi-key point detection as described in any one of claims 1 to 7.9.一种计算机可读存储介质,其特征在于,该计算机可读存储介质上存储有被编程或配置以执行权利要求1~7中任意一项所述基于多关键点检测的多人体育项目计数方法的计算机程序。9. A computer-readable storage medium, characterized in that a computer program programmed or configured to execute the multi-person sports event counting method based on multi-key point detection as described in any one of claims 1 to 7 is stored on the computer-readable storage medium.
CN202411668090.1A2024-11-212024-11-21 A multi-person sports event counting method, system and medium based on multi-key point detectionPendingCN119693840A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202411668090.1ACN119693840A (en)2024-11-212024-11-21 A multi-person sports event counting method, system and medium based on multi-key point detection

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202411668090.1ACN119693840A (en)2024-11-212024-11-21 A multi-person sports event counting method, system and medium based on multi-key point detection

Publications (1)

Publication NumberPublication Date
CN119693840Atrue CN119693840A (en)2025-03-25

Family

ID=95039891

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202411668090.1APendingCN119693840A (en)2024-11-212024-11-21 A multi-person sports event counting method, system and medium based on multi-key point detection

Country Status (1)

CountryLink
CN (1)CN119693840A (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110147743A (en)*2019-05-082019-08-20中国石油大学(华东) A real-time online pedestrian analysis and counting system and method in complex scenes
CN111368810A (en)*2020-05-262020-07-03西南交通大学 Sit-up detection system and method based on human body and skeleton key point recognition
CN112163516A (en)*2020-09-272021-01-01深圳市悦动天下科技有限公司Rope skipping counting method and device and computer storage medium
CN112464715A (en)*2020-10-222021-03-09南京理工大学Sit-up counting method based on human body bone point detection
US20210183095A1 (en)*2019-12-132021-06-17Zebra Technologies CorporationMethod, System and Apparatus for Detecting Item Facings
CN114783181A (en)*2022-04-132022-07-22江苏集萃清联智控科技有限公司Traffic flow statistical method and device based on roadside perception
CN115272918A (en)*2022-07-082022-11-01同济大学 A method, device and storage medium for player tracking and ball holder identification
CN116052277A (en)*2023-02-022023-05-02光彻科技(杭州)有限公司Sit-up counting and anti-cheating system based on human body detection and skeleton key point detection
CN116416551A (en)*2023-01-062023-07-11浙江大学计算机创新技术研究院Video image multi-person self-adaptive rope skipping intelligent counting method based on tracking algorithm
CN116486479A (en)*2023-04-042023-07-25北京百度网讯科技有限公司 Physical fitness detection method, device, equipment and storage medium
CN117253290A (en)*2023-10-132023-12-19景色智慧(北京)信息科技有限公司Rope skipping counting implementation method and device based on yolopose model and storage medium
CN117315770A (en)*2023-08-042023-12-29深圳大学Human behavior recognition method, device and storage medium based on skeleton points
CN118230427A (en)*2024-05-242024-06-21浪潮软件科技有限公司Sit-up counting method and system suitable for online sports activities
CN118397700A (en)*2024-04-292024-07-26北京科技大学 A human drowning detection method and detection system based on ST-GCN
CN118747770A (en)*2024-06-072024-10-08江淮前沿技术协同创新中心 Vision-based dynamic multi-target recognition, pose estimation and tracking method and system
CN118968629A (en)*2024-08-232024-11-15山东科技大学 Automatic evaluation method of sit-up action quality based on posture key points

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN110147743A (en)*2019-05-082019-08-20中国石油大学(华东) A real-time online pedestrian analysis and counting system and method in complex scenes
US20210183095A1 (en)*2019-12-132021-06-17Zebra Technologies CorporationMethod, System and Apparatus for Detecting Item Facings
CN111368810A (en)*2020-05-262020-07-03西南交通大学 Sit-up detection system and method based on human body and skeleton key point recognition
CN112163516A (en)*2020-09-272021-01-01深圳市悦动天下科技有限公司Rope skipping counting method and device and computer storage medium
CN112464715A (en)*2020-10-222021-03-09南京理工大学Sit-up counting method based on human body bone point detection
CN114783181A (en)*2022-04-132022-07-22江苏集萃清联智控科技有限公司Traffic flow statistical method and device based on roadside perception
CN115272918A (en)*2022-07-082022-11-01同济大学 A method, device and storage medium for player tracking and ball holder identification
CN116416551A (en)*2023-01-062023-07-11浙江大学计算机创新技术研究院Video image multi-person self-adaptive rope skipping intelligent counting method based on tracking algorithm
CN116052277A (en)*2023-02-022023-05-02光彻科技(杭州)有限公司Sit-up counting and anti-cheating system based on human body detection and skeleton key point detection
CN116486479A (en)*2023-04-042023-07-25北京百度网讯科技有限公司 Physical fitness detection method, device, equipment and storage medium
CN117315770A (en)*2023-08-042023-12-29深圳大学Human behavior recognition method, device and storage medium based on skeleton points
CN117253290A (en)*2023-10-132023-12-19景色智慧(北京)信息科技有限公司Rope skipping counting implementation method and device based on yolopose model and storage medium
CN118397700A (en)*2024-04-292024-07-26北京科技大学 A human drowning detection method and detection system based on ST-GCN
CN118230427A (en)*2024-05-242024-06-21浪潮软件科技有限公司Sit-up counting method and system suitable for online sports activities
CN118747770A (en)*2024-06-072024-10-08江淮前沿技术协同创新中心 Vision-based dynamic multi-target recognition, pose estimation and tracking method and system
CN118968629A (en)*2024-08-232024-11-15山东科技大学 Automatic evaluation method of sit-up action quality based on posture key points

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
沈守娟等: "基于YOLOv3算法的教室学生检测与人数统计方法", 软件导刊, no. 09, 15 September 2020 (2020-09-15)*

Similar Documents

PublicationPublication DateTitle
US20180047175A1 (en)Method for implementing human skeleton tracking system based on depth data
US7899206B2 (en)Device, system and method for determining compliance with a positioning instruction by a figure in an image
WO2021129064A1 (en)Posture acquisition method and device, and key point coordinate positioning model training method and device
CN104035557B (en)Kinect action identification method based on joint activeness
CN100541540C (en) 3D Human Motion Restoration Method Based on Silhouette and End Nodes
CN114049590B (en) A video-based ski jumping analysis method
CN115116127A (en) A fall detection method based on computer vision and artificial intelligence
CN113408435B (en) A security monitoring method, device, equipment and storage medium
JP2000251078A (en)Method and device for estimating three-dimensional posture of person, and method and device for estimating position of elbow of person
CN115761901B (en) A method for detecting and evaluating riding posture
CN114299050A (en) An infrared image fall detection method based on improved Alphapose
CN106650628A (en)Fingertip detection method based on three-dimensional K curvature
CN116958872A (en)Intelligent auxiliary training method and system for badminton
CN117037272B (en)Method and system for monitoring fall of old people
CN110163046A (en)Human posture recognition method, device, server and storage medium
CN118397700A (en) A human drowning detection method and detection system based on ST-GCN
CN118781654A (en) A method for ICU patient motion video recognition based on human skeleton key point detection
CN115909499A (en)Multi-person human body posture estimation method based on deep learning
CN114550071B (en)Method, device and medium for automatically identifying and capturing track and field video action key frames
CN114639160A (en)Method for defining human head action, posture and joint relation through visual recognition
WO2016021152A1 (en)Orientation estimation method, and orientation estimation device
CN119097892A (en) Sports evaluation method, device, electronic equipment and storage medium
CN109241952A (en)Personage's method of counting and device under crowd scene
CN119693840A (en) A multi-person sports event counting method, system and medium based on multi-key point detection
CN112036324A (en) A method and system for human posture determination for complex multi-person scenes

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp