Detailed Description
The invention is described in further detail below with reference to the attached drawings and detailed description:
the embodiment describes a sit-up motion quality automatic evaluation method based on gesture key points, as shown in fig. 1, comprising the following steps:
And 1, firstly, acquiring a sit-up video data set, and extracting and labeling video key action frames of the acquired video.
Since there is no video dataset for sit-up actions in the existing public dataset, the present invention self-builds a set of sit-up video dataset, which can be defined as SDUST-Situp, SDUST-Situp dataset comprising 32 groups of 108 videos, each group comprising a plurality of consecutive sit-up standard and non-standard action videos.
And marking action quality scores of sit-up actions in the video, and then performing frame extraction operation on the video.
The pretreatment operation is specifically as follows:
Firstly, extracting 16 frames from each video by using a uniform frame extraction strategy; and then, carrying out frame similarity estimation on 16 frames of each video by using a video similarity learning network ViSiL, removing redundant frames with higher similarity, and finally reserving 4 key frames, namely a start key frame, an upper body lifting key frame, an abdomen lifting key frame and an end key frame, as shown in fig. 2. In this way, not only can the key features of a single sit-up action be preserved, but the data size required to be processed is reduced.
The uniform frame extraction strategy is specifically as follows:
Firstly, reading an input sit-up video file, obtaining the total frame number of each video, setting a fixed threshold value for 16 frame number extraction, calculating a frame extraction interval step (step=total frame number/16) according to the total frame number of the video, extracting one frame of image every step frame for storage, and uniformly storing 16 frames of images for each video after cyclic processing.
And carrying out inter-frame similarity calculation on the 16-frame images stored after each video is uniformly extracted, and removing redundant frames with higher similarity by using ViSiL similarity learning network, wherein the specific process is as follows:
Firstly, sequentially extracting spatial information of 16 frames of images of each video by using a Convolutional Neural Network (CNN), wherein the spatial information comprises characteristics such as color, texture, shape and the like; after extracting the spatial features, further capturing timing information between frames using a Recurrent Neural Network (RNN), such as features of motion and variation;
And secondly, fusing the space-time characteristics by adopting a bilinear fusion method, and calculating cosine similarity between every two frames of characteristics.
Finally, in order to reject redundant frames with higher similarity, a similarity threshold is set, for example, 0.8, when the similarity between two frames exceeds the threshold, the previous frame is selected to be reserved, the next frame is rejected, the processing is sequentially carried out, and finally 4 key frames are reserved.
And 1.2, marking the extracted key frames by adopting lightweight graphic marking software Labelme.
The key gestures are marked by the minimum circumscribed rectangle of the target, the labels are Supine, lift, rise, achieve respectively, and represent the starting, upper body lifting, abdomen lifting and ending actions respectively; the key point labeling labels of the human body gesture are wrist, elbow, shoulder, hip, knee, ankle respectively, and represent six gesture key points of wrists, elbows, shoulders, hips, knees and ankles respectively.
Through the processing, 404 pieces of key posture data and 2424 pieces of key point data can be obtained, and the labeling result is sent into a key posture estimation and key point detection network for learning training so as to obtain key posture and key point detection results.
And 2, constructing a key posture estimation and key point detection network based on the improved YOLOv-Pose model, and transmitting the key frames extracted in the step 1 into the key posture estimation and key point detection network to perform posture estimation and key point detection.
Since YOLOv-Pose model has excellent performance on key point detection, the invention is improved on the basis of YOLOv-Pose model, and is used for posture estimation and key point detection of sit-up actions, and the model structure is shown in figure 3.
The modified YOLOv-Pose model includes a backbone network, a neck network, and a head network.
The main network leads the network to pay attention to the relevant characteristic areas of the human body more comprehensively by introducing a global attention mechanism GAM, and effectively integrates the context information among different channels of the characteristic diagram so as to improve the accuracy and the robustness of human body posture detection; then forming aggregation features on multiple scales through the space pyramid pooling layer; and then, the importance of different scale features is adaptively emphasized by introducing a multi-scale attention mechanism EMA, so that the detection and positioning capability of key points of human bones is improved.
The output of the main network is followed by fusion and enhancement of the characteristics through the neck network; and finally, deciding by using a detection head network, generating final detection, and finally outputting a key gesture and a key point detection result by using a YOLOv-Pose model.
The backbone network includes a convolution module, a C2f module, a global attention mechanism GAM module, a spatial pyramid pooling layer, and a multi-scale attention mechanism EMA module, as shown in fig. 3. The processing flow of the signals in the backbone network is as follows:
The input key frame image sequence firstly carries out image feature extraction through two convolution modules, then captures complex features in the image through one C2f module, then carries out image feature extraction through one convolution module, then captures complex features in the image through one C2f module, carries out image feature extraction through one convolution module, and then enters a global attention mechanism GAM, so that the network focuses on the relevant feature areas of the human body more comprehensively, and context information is effectively integrated among different channels of the feature map; then forming aggregation features on multiple scales through the space pyramid pooling layer; the importance of different scale features is adaptively emphasized by introducing a multi-scale attention mechanism EMA so as to improve the detection and positioning capability of key points of human bones.
The specific structure of the global attention mechanism GAM module is shown in fig. 4. The method can effectively extract the characteristic information on the basis of cross channels and three dimensions, and meanwhile, the integrity of the original information is maintained.
The global attention mechanism GAM module includes a channel attention module MC and a spatial attention module MS.
The input feature F1 first captures its important information in the spatial dimension F2 further by the channel attention module MC; f2 then proceeds to spatial attention module MS for further processing to highlight highly correlated channel features.
The process flow is represented as follows:
(1)
Wherein the method comprises the steps ofRepresenting the multiplication by element,、Respectively, the channel attention process and the spatial attention process.
By introducing the GAM module, the YOLOv-Pose network can more comprehensively pay attention to the relevant characteristic areas of the human body, and effectively integrate the context information among different channels of the characteristic diagram so as to improve the accuracy and the robustness of human body posture detection.
A specific structure of the multiscale attention mechanism EMA module is shown in fig. 5. The parallel substructure is adopted to reduce the network depth, so that better pixel-level attention is generated for the advanced feature map under the condition of not reducing the channel dimension, and the multi-dimensional perception and multi-dimensional feature extraction capacity are improved. The processing flow of the signal in the multiscale attention mechanism EMA module is as follows:
The input aggregate features are first divided into a plurality of sub-features to form feature sets, which are processed through three parallel lines in two branches. The first branch is 1*1 branches, the first branch comprises two parallel lines, a one-dimensional horizontal global pooling and a one-dimensional vertical pooling are adopted to encode feature groups along two space directions respectively, then two encoding features are connected and shared 1*1 to be convolved, then the output of the 1*1 convolution is decomposed into two vectors and then respectively and correspondingly passes through a nonlinear Sigmoid activation function, and then the two vectors and the feature groups are subjected to re-weighting operation and grouping normalization, and finally the features are remolded through average pooling and normalized by using a Softmax normalization function; the second branch is 3*3 branches, the feature group captures local cross-channel interaction through 3*3 convolution to expand the feature space, the feature is remodelled through average pooling and normalized by using a Softmax normalization function, and matrix multiplication is performed on the feature subjected to grouping normalization with the first branch by using a Matmul function to obtain a first feature matrix; meanwhile, the features subjected to 3*3 convolutions are subjected to matrix multiplication by using Matmul functions and features normalized by the first branch normalization functions to obtain a second feature matrix; adding the first characteristic matrix and the second characteristic matrix, and generating an attention weight matrix through a Sigmoid activation function; and finally, carrying out re-weighting operation on the input feature group and the attention weight matrix to obtain output features optimized by the EMA attention mechanism.
According to the invention, by introducing the EMA module, the YOLOv-Pose network can adaptively emphasize the importance of different scale characteristics, so that the detection and positioning capability of key points of human bones is improved.
According to the invention, the action characteristics of sit-ups and the actual measurement environment are considered, a Situp-PoseNet model based on YOLOv-Pose network is provided, a global Attention mechanism (Global Attention Mechanism, GAM) and a multi-Scale Attention mechanism (EFFICIENT MULTI-Scale Attention, EMA) are integrated into Yolov-Pose network, the detection capability of human body gestures and the positioning capability of skeleton key points are effectively improved, and the evaluation result of a grading network is more accurate.
And step 3, finally, sending the key point data obtained by the key gesture estimation and key point detection network into a score estimation network to perform weighted scoring of key action points, and finishing quality estimation of sit-up actions.
The invention designs a score evaluation network for finishing quality scoring standard and weighted scoring based on video analysis and understanding of sit-up actions. Specifically, 4 key stages (postures) in the sit-up process are selected as evaluation objects, and different weights are given to the 4 key postures. Each key gesture has 3 action points including holding the head of two hands, bending the knees by 90 degrees and folding the body, and each action point also has respective weight. And evaluating the motion completion quality and the nonstandard motion according to the motion gist of each stage, thereby completing the motion quality weighted scoring of the sit-up.
Compared with the prior manual scoring strategy with stronger subjectivity and the method of only focusing on action counting and easily neglecting action quality, the scoring strategy provided by the invention focuses on the assessment of the stage, consistency and action standardization degree of the action.
By subdividing key postures and action points and giving different weights to the key postures and the action points, the method can more reasonably reflect the accuracy, the standstill and the action points, help learners better understand the correct postures and the action points of the sit-ups, enable the learners to pay more attention to the quality and the standstill of the actions, not just pursue quantity, and achieve more scientific and fair scoring.
The score evaluation network structure is shown in fig. 6, and the overall processing thought is as follows:
After the key posture estimation and key point detection network obtains the positions and corresponding coordinates of each key point of the sit-up key frame, the key point positions and corresponding coordinates are sent to the scoring network for calculating the angles of the key points, the distribution of weight coefficients and judgment of deduction items are carried out, and finally, final action quality prediction scores are obtained through weighted summation, so that the quality assessment of sit-up actions is completed.
According to the 1 minute sit-up test standard, in order to achieve the quality score for the completion of the sit-up actions, 4 key stages (postures) in the sit-up process are selected as evaluation objects, namely:
The sit-up starting posture P1 is characterized in that a subject lies on the back on a cushion, two shoulder blades touch the cushion, the knee is bent by about 90 degrees, and the hands hold the head; the upper body lifting gesture P2 is characterized in that the upper body of the subject is lifted off the soft cushion, and the hands are held; the abdomen lifting gesture P3 is characterized in that the subject finishes the sitting stage by means of abdomen force and holds the hands with the head; the sit-up ending posture P4, the motion features the subject sitting up with both elbows touching or exceeding the knees.
The standard attitudes of P1 to P4 are schematically shown in fig. 7. Wherein P1 is the standard posture at the beginning of the test, P2 is the standard posture of the upper body lifting, P3 is the standard posture of the abdomen lifting, and P4 is the standard posture of the sit-up ending.
The motion quality of sit-ups is automatically scored based on video analysis and understanding, and the positions of skeleton key points of the 4 key stages need to be accurately tracked and matched with standard gestures to obtain scores.
As shown in (a) of fig. 8, six key points a to F in the video action skeleton are selected as tracking targets for pose estimation; wherein the A point is positioned at the shoulder joint, and the coordinate mark is (x1,y1); point B is located at hip joint (x2,y2), point C is located at knee joint (x3,y3), point D is located at elbow joint (x4,y4), point E is located at wrist joint (x5,y5), and point F is located at ankle joint (x6,y6). Fig. 8 (b) is an example frame of the P3 phase, the region of interest is marked with a rectangular box, and fig. 8 (c) is 6 key points captured by the frame.
In order to evaluate the motion completion quality of each stage, judging according to the importance of the key gesture and the motion completion quality of each key gesture; first, the 4 key stages are given different weights, namely, P1 is 0.3, P2 is 0.2, P3 is 0.2, and P4 is 0.3. And then evaluating the motion completion quality and the nonstandard motion according to the motion key points of each stage, wherein the evaluation standards are as follows: and (5) correspondingly scoring the degrees of head holding, knee bending by 90 degrees and body folding of the hands according to the angle matching degree of the key points.
The specific weights and scores for each action point are shown in table 1.
Table 1 sit-up action scoring method based on key point angles
Specifically, the specific process of evaluating the motion completion quality and nonstandard motion is as follows:
The normalization of the head holding action of the two hands is defined and judged by the folding angle D of the arms; when the angle D is less than or equal to 45 degrees, the standard motion of holding the head of the two hands is considered, and the score is 5;
When the angle D is larger than 45 degrees, the scores are respectively 4, 3, 2, 1 and 0 according to the nonstandard degree of the action.
The calculation formula of the angle D is as follows:
(2)
Judging whether the knee bending of the subject is 90 degrees or not through the included angle C of the legs; when the included angle C is more than 80 degrees and less than 95 degrees, the knee bending standard action is considered to be obtained, and the score is 5;
and when the included angle C is not in the angle range, the included angles are sequentially divided into 4, 3, 2, 1 and 0 according to the nonstandard degree of the action.
The calculation formula of the angle C is as follows:
(3)
Whether the subject lies on the back and sits up is judged according to the folding degree of the body, namely, when the folding angle B of the body is more than or equal to 120 degrees, the subject is considered to be a standard action of lying on the back, and the score is 5 minutes;
when the angle B is smaller than 120 degrees, the scores are sequentially 4, 3, 2, 1 and 0 according to the nonstandard degree of the action.
The standard degree of the sitting-up motion is judged according to the angle B', namely when the folding angle of the body is less than or equal to 60 DEG and the elbow touches or exceeds the kneesWhen the time is up to 5 minutes;
when the angle B' is larger than 60 degrees, the score is 4, 3,2, 1 and 0 according to the nonstandard degree of the action.
The calculation formulas of the angle B and the angle B' are the same, as shown in the formula (4):
(4)
To sum up, based on the 4 key stages PM of sit-up video and the 3 key action points PMN of each stage, i.e. standstill of both hands, bending knees by 90 ° and folding body, the final predictive scoring formula is:
(5)
Where SP is the final prediction score, αMN is the weight of each stage key action point, M is the key stage number, N is the key action point number, and βM is the weight of each key stage.
There is no published dataset of sit-up motion video, and in order to verify the effectiveness of the method of the present invention, the present invention self-builds a sit-up video dataset SDUST-Situp. The data set comprises 108 mp4 videos in 32 groups, the length of each video segment is about 2-3 seconds, and the camera equipment adopts an iPhone13Pro mobile phone. The test environment for sit-ups was arranged according to the "1 minute sit-up" test standard. For each volunteer, the complete motion of its sit-up from start to end was captured. Video with significant blur, jitter, or poor illumination may be culled. In addition, still pictures at the beginning and end are also rejected. The video resolution and frame rate are normalized, the resolution is 720×720, and the frame rate is 30.
The ratio of the training set to the verification set is divided into 4:1, namely 324 frames of the training set and 80 frames of the verification set.
Aiming at sit-up action gesture detection and key point detection tasks, mAP is used as an evaluation index, which is mAP@0.5 and mAP@0.5-0.95 respectively. mAP@0.5 denotes the average precision at IoU of 0.5, mAP@0.5-0.95 denotes the average precision at different IoU values (from 0.5 to 0.95, step size 0.05).
Wherein IoU (Intersection over Union) denotes the degree of overlap between the predicted value and the actual value.
The detection precision of the key points adopts mAPPose as an evaluation index, and the calculation formula is as follows:
(6)
In the formula, N represents the number of categories of the key point.
APPose represents the average accuracy of the keypoint detection, calculated as:
(7)
OKS (object keypoint similarity) denotes the similarity between the true keypoint and the predicted keypoint, calculated by:
(8)
Wherein dpi represents the Euclidean distance between the i-th key point detection position and the real position, Sp represents the scale factor of p points, vpi represents the visibility of the key points, 0 is unlabeled, 1 is that the label is blocked, and 2 is that the label is visible.
Deltai represents the normalization factor for the i-type keypoints. When vpi is more than 0, delta is 1, and when vpi is less than or equal to 0, delta is 0.
T is a given threshold, when OKSp > T, β takes the value of OKSp, otherwise β takes 0.
The detection precision of the action gesture adopts mAPBox as an evaluation index, and the calculation formula is as follows:
(9)
In the formula, M represents the number of categories of the motion gesture. APBox represents the average accuracy of motion gesture detection, which is obtained by calculating the area under the P-R curve, the area can be calculated by integral, and the calculation formula is:
(10)
In the formula, P represents accuracy (Precision) and R represents Recall (Recall), and the calculation formula is as follows:
(11)
(12)
where s represents a certain gesture of action, and non-s represents other states or actions than s; TPs represents the number of frames correctly classified as s; FPs denotes the number of frames misclassified as s; FNS represents the number of frames misclassified as non-s.
The prediction accuracy of the motion quality score adopts a Mean Square Error (MSE) and a Spearman rank correlation coefficient rho (Spearman's rank correlation coefficient) as evaluation indexes to reflect the correlation degree of the prediction score and the true score.
Let SP denote the prediction score of the model, SG denote the score (score range is 0-5) marked by the expert, and the calculation formulas of the mean square error MSE and the spearman rank correlation coefficient ρ are respectively:
(13)
(14)
Where L represents the number of videos, and SlP and SlG represent the prediction score and the true score of the first video, respectively. R (SlP) and R (SlG) represent a score series ranking of the predicted score and the true score, respectively, and the superscript l represents the video sequence number.
The higher the value of ρ is [ -1,1], the better the fractional predictive performance of the network.
The running environment of all experiments is a 64-bit Windows10 operating system, the CPU is Intel (R) Xeon (R) Silver 4210R, the memory is 128GB, the display card is NVIDIA RTX A6000, and the memory is 48GB.
Programming is achieved using Python 3.8, using Pytorch-GPU 1.9.0 and CUDA 11.1. The input image size was 640 x 640, the batch size was 32, the initial learning rate was 0.01, the decay factor was 0.0005, and the epochs was 300, and training was accelerated using a AdamW optimizer. Other parameters not mentioned all use the default parameters of YOLOv-pose authorities.
The invention provides an improved key posture estimation and key point detection network based on YOLOv-Pose network, which is used for performing action posture estimation and key point detection on a video key action frame and performing sit-up action quality weighted scoring through key angle characteristics, and the overall model is named as Situp-PoseNet. Table 2 shows the performance indicators of the present invention model Situp-PoseNet and other SOTA methods (PEPCONT, ST-GCN, GDLT) on the SDUST-Situp dataset. Wherein PEPoseNet and ST-GCN belong to a model based on skeleton key points, and GDLT is a transducer framework for video streams.
For the comparison algorithm, no change was made except that the last scoring module was modified to fit the action quality assessment for sit-ups. Whether the gesture estimation and key point detection performance or the motion quality score prediction performance are achieved, situp-PoseNet show obvious advantages, and the mean square error MSE and the spearman rank correlation coefficient rho respectively reach 0.017 and 0.933, so that the space-time correlation capability of Situp-PoseNet in capturing the key points of sit-up motion is improved obviously. Compared with YOLOv-Pose reference model, the spearman rank correlation coefficient is improved by 7.6%, and the predictive fractional mean square error is reduced by 4.3%.
TABLE 2 Performance of quality of action assessment for different models on SDUST-Situp datasets
The PEPoseNet model incorporates dataset features with pull-up human body joint point coordinates and sports equipment key point coordinates during training, but this training approach may not be fully applicable to sit-up motion feature detection. Therefore, the key point estimation performance is relatively low, and mAPPose @0.5/% is only 83.1%, so that the performance of the action quality estimation is slightly insufficient. In contrast, the ST-GCN model has stronger feature expression and generalization capabilities by automatically learning spatial and temporal features from the data, making it superior to PEPoseNet in keypoint estimation. However, the ST-GCN may have a problem of redundant keypoints, and the scoring network may be affected by redundant keypoints in the calculation process, so that the score prediction performance of the scoring network is relatively poor. GDLT is to directly extract the characteristics of the video stream, the motion characteristics are easily affected by background factors such as environment, and the motion quality score prediction performance is general. The MSE of GDLT is worst from the mean square error MSE of the predicted score, but the spearman rank correlation coefficient ρ is not worst, and is 0.753, which indicates that the performance of the MSE in the ranking of the predicted score series is better than the error performance between the predicted score and the true score. Although PEPCOSeNet, ST-GCN and GDLT have advantages and disadvantages in key point estimation and action quality estimation, the PEPCoNet, ST-GCN and GDLT have certain reference value in practical application.
Tables 3 and 4 list the results of the ablation experiments of gesture detection and key point estimation and the results of the ablation experiments of the motion quality prediction scores, respectively. In order to verify the effect of adding a GAM attention mechanism and an EMA module on YOLOv-Pose gesture detection and key point estimation accuracy, the invention adopts an ablation test for comparison. It can be seen that the reference model YOLOv-Pose has the worst gesture detection and key point estimation capabilities, the average mAPBox@0.5/%、mAPBox @0.5-0.95/% is 94.8% and 91.4%, and the average mAPPose@0.5/%、mAPpose @0.5-0.95/% is 94.8% and 93.5%, respectively, because the reference model is trained on COCO data sets marked with 17 key points of a human body, and for sit-up actions, the gesture detection can have redundancy phenomenon, and the detected key points are more chaotic. After the GAM attention mechanism is added in the reference model, the model focuses on the human body posture interested region more, and the posture detection performance and the key point estimation performance are improved by 0.1%, 2.2%, 0.1% and 1.4% respectively. After the EMA module is added in the reference model, the model focuses on the characteristics of the key points of bones, so that the estimation capability of the key points is improved, and meanwhile, the gesture detection capability is also improved by 0.6%, 1.7%, 0.6% and 0.7% respectively. Finally, GAM and EMA modules are added to the reference model at the same time, so that the model can further improve the estimation capability of key points on the basis of extracting the human body posture interested region, and indexes are respectively improved by 1.2%, 3%, 1% and 2.2%.
TABLE 3 results of ablation experiments for gesture detection and keypoint estimation
Meanwhile, the invention also compares the influence of YOLOv-Pose model added with a GAM attention mechanism and an EMA module on the performance of action quality scoring, and the result is shown in table 4. It can be seen that after the GAM module is added into the YOLOv-Pose reference model, the model can pay more attention to the interesting human body action gesture area, the feature area is reduced, so that the key point extraction capacity is improved, the score prediction result is improved, the MSE is reduced by 3.2%, and the rho is improved by 5.6%. And an EMA module is added into the reference model, so that the model focuses on key point characteristics of human bones, the characteristic extraction capability is enhanced, the score prediction result is improved, the MSE is reduced by 1.1%, and the rho is improved by 6.6%. And the GAM and the EMA modules are added into the reference model at the same time and supplement each other, so that the model can further improve the estimation capability of key points on the basis of extracting the region of interest, the accuracy of score estimation is effectively improved, and the spearman rank correlation coefficient rho can reach 0.933.
Table 4 results of ablation experiments for motion quality prediction scores
Experimental results show that the model Situp-PoseNet provided by the invention can accurately score the quality of a single complete sit-up action, the spearman rank correlation coefficient reaches 0.933, and the mean square error reaches 0.017. Therefore, according to the Situp-PoseNet model provided by the invention, the accuracy of gesture recognition and key point positioning can be effectively improved by adding the GAM module and the EMA attention mechanism in YOLOv-Pose, and the effectiveness of the improvement method is proved.
The number of video frames and frame size have a direct impact on computational resources and motion quality assessment performance. Under the precondition of ensuring the motion quality evaluation performance, the video frame number and the frame size can be reduced as much as possible, so that the calculation cost can be reduced, and the real-time detection on the portable equipment is facilitated. Table 5 sets forth the results of experiments using different frames numbers and frame sizes for the SDUST-Situp dataset.
Table 5 SDUST-Situp evaluation Performance of datasets of different frame numbers and frame sizes
As can be seen from table 5 above, the effect of frame size is comparatively larger than the effect of frame number. The larger frame size means higher resolution, the more detail can be provided, the better the fractional predictive performance of the model. For example, in the case of the frame number of 4, the MSE and ρ of the frame size of 320×320 are 0.082 and 0.878, respectively, and the MSE and ρ of the frame size of 1024×1024 are 0.016 and 0.937, respectively, the performance improvement effect is remarkable. A similar finding is found for a frame number of 8. But the calculation time increased from 3.92 seconds to 8.02 seconds. Therefore, the selection of the frame number and the resolution needs to be considered in a compromise, and the data in the observation table can be considered that the combination effect of the frame number of 4 and the frame size of 640×640 is optimal.
The method of the invention provides an improved key posture estimation and key point detection network based on YOLOv-Pose network, carries out motion posture estimation and key point detection on video key action frames, greatly improves the motion posture estimation and key point detection capability, effectively improves the detection capability of human body posture and the positioning capability of skeleton key points, and enables the motion quality prediction score of the scoring network to be more accurate.
The foregoing description is, of course, merely illustrative of preferred embodiments of the present invention, and it should be understood that the present invention is not limited to the above-described embodiments, but is intended to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.