Movatterモバイル変換

Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 12190))

Included in the following conference series:

International Conference on Human-Computer Interaction

3244Accesses
17Citations

Abstract

The use of drones in search and rescue (SAR) missions can be very cognitively demanding. Since high levels of cognitive workload can negatively affect human performance, there is a risk of compromising the mission and leading to failure with catastrophic outcomes. Therefore, cognitive workload monitoring is the key to prevent the rescuers from taking dangerous decisions. Due to the difficulties of gathering data during real SAR missions, we rely on virtual reality. In this work, we use a simulator to induce three levels of cognitive workload related to SAR missions with drones. To detect cognitive workload, we extract features from different physiological signals, such as electrocardiogram, respiration, skin temperature, and photoplethysmography. We propose a recursive feature elimination method that combines the use of both an eXtreme Gradient Boosting (XGBoost) algorithm and the SHapley Additive exPlanations (SHAP) score to select the more representative features. Moreover, we address both a binary and a three-class detection approaches. To this aim, we investigate the use of different machine-learning algorithms, such as XGBoost, random forest, decision tree, k-nearest neighbors, logistic regression, linear discriminant analysis, gaussian naïve bayes, and support vector machine. Our results show that on an unseen test set extracted from 24 volunteers, an XGBoost with 24 features for discrimination reaches an accuracy of 80.2% and 62.9% in order to detect two and three levels of cognitive workload, respectively. Finally, our results are open the doors to a fine grained cognitive workload detection in the field of SAR missions.

This work has been partially supported by the NCCR Robotics through the Symbiotic Drone project, and by the ONR-G through the Award Grant No. N62909-17-1-2006.

You have full access to this open access chapter, Download conference paper PDF

Experimental Study on Emergency Psychophysiological and Behavioral Reactions to Coal Mining Accidents

Article28 June 2024

Towards Recognition of Driver Drowsiness States by Using ECG Signals

Classifying metro drivers’ cognitive distractions during manual operations using machine learning and random forest-recursive feature elimination

ArticleOpen access04 March 2025

Keywords

1Introduction

The use of drones in search and rescue (SAR) missions can be very cognitively demanding. Rescuers often operate in extreme conditions, under time pressure, and dealing with the scarcity of human resources. Moreover, they have to find strategies to face all types of unexpected events that a SAR mission may hide. Since high levels of cognitive workload can negatively affect human performance [10,13], there is a risk of compromising the mission and leading to failure with catastrophic outcomes. Therefore, a Cognitive Workload Monitoring (CWM) system that detects elevate workload levels could be used to notify the rescues and prevent them from taking dangerous decisions.

Nowadays, there are three main methods to measure cognitive workload; that is, subjective surveys, performance metrics, and assessing the human physiological response [3,6]. Due to the needs of a continuous monitoring and the unpredictability of SAR missions, neither surveys nor performance metrics can be used for cognitive workload monitoring of rescues. In contrast, the cognitive workload monitoring from physiological signals, which can be acquired with wearable sensors, is suitable for this application.

Continuous workload monitoring is a challenging task because is a multidimensional problem. Many factors affect cognitive workload; that is mental, physical, and temporal demands, the overall performance, the frustration level, and the effort [5]. The perceived workload is both environmental and subjective dependent, it changes among individuals according to their learning skills and their ability to address and perform a particular task. Despite that, state-of-the-art studies that target cognitive workload detection from physiological signals showed promising results [3,4,9,12].

The objective of this work is to detect cognitive workload levels from physiological signals in the field of SAR missions with drones. However, due to the unpredictable conditions and the difficulties of having a cognitive workload reference value during real SAR missions, the data collection becomes really difficult. Therefore, we rely on Virtual Reality (VR) to emulate immersive SAR mission. Our main contributions are as follows:

We induce three levels of cognitive workload using a VR simulator for SAR missions with drones.
We detect high and low levels of cognitive workload with an accuracy, precision and recall of 80.2%, 79.6%, 71.7%, respectively on an unseen dataset, which prove the generalization of our model.
We explore a three-class cognitive workload detection, thus identifying high, medium, and low levels of cognitive workload with an accuracy of 62.9%, which is a promising results considering the complex identification of cognitive workload levels generated in our virtual SAR mission.

The rest of the paper is organized as follows. Section 2 describes the methods applied to induce and detect different levels of cognitive workload. Section 3 describes the setup of the experiment and Sect. 4 reports our results. Finally, in Sect. 5 we draw the main conclusions of this work.

2Cognitive Workload Characterization and Detection Methods

The general idea of the method applied for estimating cognitive workload, including sensor placement, is show in Fig. 1. A VR based simulator is used to induce different levels of cognitive workload by changing the difficulty of the tasks [4]. Then, the cognitive workload is detected from physiological signals by following a standard methodology, which includes signals acquisition, preprocessing, feature extraction, and classification. Finally, an elicitation technique based on a multidimensional assessment tool, the NASA Task Load Index (NASA-TLX) [6], and a questionnaire to evaluate the difficulty of the tasks are used to label the different levels of induced cognitive workload.

2.1Simulator for Search and Rescue Mission with Drones

The proposed VR based simulator emulates a simplified SAR mission, where the pilot needs to fly a drone through a pathway and map a damage situation of a disaster area. The simulator was implemented with Unity3D [15] and used to track the workload influence on SAR missions with drones in [4]. In the simulator, the flight pathway is shown by 90 waypoints (black rings) distributed every 20 m along a randomly generated trajectory over a village. The speed of the drone is fixed at 6 m/s for all the tasks. The damage situation is represented by cubes that randomly appear over the flying pathway, as shown in Fig. 2. The colors of the cubes are chosen according to the regulation of the Swiss Firefighters; that is, yellow to indicate rescue situations, red for fire, blue for water damages, and green for the accidents.

To induce different levels of workload, both flying and mapping activities are combined yielding in the following tasks:

Training (T): A training sequence is proposed to let the user familiarize himself with the simulator. The training sequence has a combination of both flying and mapping activities. The user is asked to fly as close as possible through the center of the waypoints. In addition, the user needs to press the button of the controller relative to the color of the objects that are randomly displayed. The number of objects to be mapped are 60 per session, i.e., 15 per color.
Baseline (B): To set a physiological baseline avoiding as much as possible the effect of uncontrollable variables, a passive task with the same framework as for the entire experiment is presented. This task is a flying sequence controlled by an auto-pilot. During this task, the user just needs to watch the sequence presented to him without any other additional activity.
Flying (F): Flying task is considered as a medium level of cognitive workload. During this task, the subject solely needs to fly the drone as close as possible to the center of the waypoints.
Mapping 3 object (3M): This task is also considered as a medium level of cognitive workload. During this task, the subject has to identify three objects of different color that appear at the same time on the screen and has to press the button of the controller corresponding to the color of the object. The number of objects to be mapped are 240 per session, i.e., 60 per color.
Flying and Mapping 3 objects (F3M): This task is considered as a high level of the cognitive workload. It is the same as training, including the same combination of both the flying and mapping activities but with a more demanding mapping activity. The objects displayed simultaneously on the screen are three as in 3M task, and not only one as in the training phase.

2.2Cognitive Workload Classification

The proposed methodology to build a model for CWM has three main steps. The first step is the acquisition of different signals, such as Electrocardiogram (ECG), Respiration (RSP), Impedance Cardiogram (ICG), Skin Temperature (SKT), pulse wave through Photoplethysmography (PPG), and Electrodermal Activity (EDA), as selected in [4,9].

The second step is the preprocessing and feature extraction, where we filter the physiological signals and we extract several features to capture the functionality of the human physiology as we did in [1,9]. For the feature extraction we use a 1-minute sliding window without overlap. Then, we apply a Recursive Feature Elimination with Cross Validation (RFECV) to select the most important physiological features that better characterize different levels of cognitive workload. Our RFECV is based on the eXtreme Gradient Boosting (XGBoost) algorithm and uses the SHapley Additive exPlanations (SHAP) framework for a more detailed prediction interpretation [8].

Finally, the third and last step is the cognitive workload classification. In this step, we first investigate several machine learning algorithms to select the model that best fits our classification problem. To this aim, we compare XGBoost, Random Forest (RF), Decision Tree (DT), k-Nearest Neighbors (kNN), Logistic Regression, Linear Discriminant Analysis (LDA), Gaussian Naïve Bayes (GNB), and Support Vector Machine (SVM).

Both RFECV and the machine learning algorithms are trained and validated on the training set using a shuffled Leave-P-Groups-Out Cross-Validation (LPGOCV) with five iterations considering each time 20% of the groups as validation set. Each group is composed of all observations recorded in one day that belong to one subject. If not otherwise specified, the machine learning algorithms are trained and tested with [11] using default parameters.

Thus, we have applied the same procedure to address both binary and three-class classification problems. Finally, to prove the generalizability of our model, a final test is done on an unseen test set, which includes data from 30% of the groups, that was initially putted apart for this purpose only.

3Experimental Study

To build a database and investigate the cognitive workload characterization and detection from physiological signals, we perform an experimental study applied a designed protocol as in Fig. 3 using the VR simulator for SAR mission with drones. To control the simulator it was used a Gamepad from Logitech [7]. We recorded the physiological signals with the Biopac acquisition set [2], which is CE certified for medical monitoring.

3.1Study Protocol

Our study protocol starts with a setup phase, where we provide information about the experiment and we place the sensors. Additionally, we propose a training sequence to let the participant get familiar with the simulator. Then, we proceed with two phases of data collection, as shown in Fig. 3.

Phase 1 is composed of four different tasks (i.e., B, F, 3M, and F3M), where the order of F, 3M, and F3M is randomized to avoid any bias related to the order of these activities. Before each resting period, the subjects were asked to rate the perceived level of cognitive workload by evaluating the NASA-TLX. The time duration for each task in Phase 1 is five minutes.

Phase 2 consists of a concatenated sequence of the tasks presented in Phase 1 (i.e., B, F, 3M, and F3M). As in Phase 1, the order of F3M, 3M, F is randomized. However, in Phase 2 there is no resting period between the tasks. Phase 2 is executed twice in a row, and the NASA-TLX is evaluated only at the end of each sequence. The tasks duration in Phase 2 is limited to three minutes. Except the data collected from the second repetition of Phase 2 that is kept apart for testing our final machine-learning algorithm, the rest of the data is selected only for training the machine-learning algorithms.

Participants were asked not to talk, and to avoid as much as possible any kind of unnecessary movements during the main tasks, but they were free to rest and move otherwise.

3.2Participants

In our study participated 24 volunteers (6 females and 18 males), aged between 21 and 39 years old (\(27.7 \pm 4.8\)). All but one participated twice in two different days. The participants were healthy, free of any cardiac abnormalities and were receiving no medical treatment. The ethical approval for this study was obtained from the Cantonal Ethics Commissions for Human Research Vaud and Geneva; namely, ethical approval application number PB2017-00295.

3.3Database

From the 47 experimental sessions performed, we build our datasets based on the workload reported by the participants in the NASA-TLX questionnaire. Values higher than 55% and lower than 5% are labeled as low and high workload, respectively. Following this criteria, 259 observations are labeled as low workload and 249 are labeled as high, which makes a dataset of 508 observations. By including the medium level of workload, we extend our dataset to 748 observations.

Both datasets are split into two subsets of 70% and 30% of groups of observations for both training and testing, respectively. Thus, the test set includes all the observation of 14 groups randomly selected from the second day, while the training set includes all the remaining groups of observations acquired the first day.

4Results

Our results include the cognitive workload perceived by the volunteers and used as reference. Moreover, we provide results of a binary CWD and results of a three-class CWD.

4.1Cognitive Workload Reference

Both the tasks’ difficulty perceived by the participants and the cognitive workload level reported with the NASA-TLX procedure after each task are shown in Fig. 4. Each box includes 144 data points, which were averaged across all the trials.

Both cognitive workload and tasks’ difficulty show an increasing trend in the order of B, 3M, F and F3M. Then, a two-tail t-tests yielded a significant difference between most of the tasks (p-value < 0.001, n = 47), except for the cognitive workload between 3M and F where the difference is not significance. The significance difference of the difficulty between tasks 3M and F is limited to a p-value < 0.05. A significant Pearson’s correlations between cognitive workload and tasks’ difficulty is observed (p-value < 0.001, n = 47). The high correlation confirms the validity of the assumption that cognitive workload increases with the difficulty of the tasks.

4.2Binary Cognitive Workload Detection

Following our methodology to develop a model for cognitive workload monitoring (see Sect. 2) we first process the physiological signals. In particular we consider in this study the RSP, ECG, PPG and SKT from where we extracted 156 features as in [9]. Then, we performed the feature selection step.

The result of the RFECV based on SHAP values and using a XGBoost model is shown in Fig. 5. The score increases by adding features until it reaches its maximum with 24 features. Then, it decreases and reaches a plateau if more than 45 features are used. Finally, we selected 24 features to train an XGBoost classifier, which reaches a LPGOCV accuracy of 79.5%. The 24 most important physiological features characterizing high levels of cognitive workload level of the participants are shown in Table 1.

Table 1. Description of the selected features.

Full size table

The comparison of different classification algorithms for the binary classification between low and high workload levels are summarized in Table 2. In particular, we report both training and LPGOCV accuracy before and after feature selection.

Table 2. Comparison of different classification models in term of accuracy.

Full size table

As expected, considering the large amount of features compared with the limited size of our dataset, the models trained with all the features show an important gap between train and LPGOCV accuracy, which is a clear sign of overfit. This overfitting problem was addressed by using simplest models (e.g., linear instead of non-linear models) and models with less parameters (i.e., lower capacity). Indeed, the use of LDA drastically reduces the gap between training and LPGOCV accuracy, especially when a limited amount of features are used. Although the training data is not sufficient to find an optimum, the highest LPGOCV accuracy is given by XGBoost (79.5%). Thus, the results provided next are based on XGBoost only.

Finally, in order to minimize the errors, we fine tuned the decision threshold of our model on cross-validation. A plot of precision, recall, and accuracy as a function of different decision thresholds is shown in Fig. 6.

Although the higher CV accuracy is reached with a threshold around 0.5, which is the default decision threshold in binary classification, the recall is poor. However, by choosing a threshold equal to 0.27 (where precision almost equals recall), we have an equal error rate and the accuracy is still acceptable. A detailed comparison between the use of a threshold at 0.5 and one at 0.27 is reported in Table 3.

Table 3. LPGOCV performance of XGBoost with different decision thresholds.

Full size table

The results on an unseen test set of the workload detection based on XGBoost with 24 selected features and a threshold at 0.27 that shows the generalization power of the final workload detection model are reported in Table 4.

Table 4. Generalization power of the workload detection model (XGBoost) on the unseen test set.

Full size table

Although results obtained in CV suggest the use of an XGBoost with the 24 selected features and a threshold of 0.27, as a comparison we also report in Table 4 the results of an XGBoost before and after feature selection and a decision threshold at 0.5. As on the test set, an increase of accuracy exists and, as a result, a reduction of the difference between precision and recall after applying both feature selection and threshold tuning. Overall, these results show that our model generalizes well for cognitive workload detection.

4.33-Class Cognitive Workload Detection

As explained in Sect. 2, we also explore the three-class detection problem using the same procedure that we applied for the binary problem. The applied RFECV technique with XGBoost algorithm based on SHAP values also selected 24 features with a maximum accuracy of 54.7%, which is still clearly better than random (33%). The selected physiological features characterizing low, medium, and high levels of cognitive workload are reported in Table 1.

Table 5. Comparison of different three-class models in term of accuracy.

Full size table

From the comparison of the different classification methods reported in Table 5, we can see that XGBoost is again slightly better than the others. The important difference between training and CV accuracy clearly indicates an overfitting problem. Acquiring more data will probably solve this problem and increase the performance of some models, such XGBoost, random forest, and decision tree. However, our results suggest to select again the XGBoost with 24 features to address our three-class problem.

Finally, on an unseen test set, the cognitive workload detection based on the 3-class XGBoost with 24 selected features shows an accuracy and both averaged precision and recall of 62.9%.

5Conclusion

The goal of this work was to detect cognitive workload levels from physiological signals in the field of SAR missions with drone. Although the volunteers were not exposed to the same stressful conditions as they would be in the case of a real SAR mission, with augmented reality we were able to inducing different level of cognitive workload that are related to SAR mission with drones. From the collected dataset we compared different machine-learning algorithms and we found that an XGBoost with only 24 features is enough to address both a binary and a three-class problem. The two class XGBoost was able to detect high and low levels of cognitive workload with an accuracy of 80.2%. The three-class XGBoost was able to detect high, medium, and low levels of cognitive workload with an accuracy of 62.9%. Considering the little difference between the levels of induced workload our results are promising and open the doors to a fine grained cognitive workload detection in the field of SAR missions.

References

Arza, A., et al.: Measuring acute stress response through physiological signals: towards a quantitative assessment of stress. Med. Biol. Eng. Comput.57(1), 271–287 (2018).https://doi.org/10.1007/s11517-018-1879-z
Article Google Scholar
Biopac: MP160 Data Acquisition Systems.https://www.biopac.com/product/mp150-data-acquisition-systems/
Cain, B.: A Review of the Mental Workload Literature. Defence Research and Development Toronto, Canada (2007).http://www.dtic.mil/dtic/tr/fulltext/u2/a474193.pdf
Dell’Agnola, F., Cammoun, L., Atienza, D.: Physiological characterization of need for assistance in rescue missions with drones. In: IEEE International Conference on Consumer Electronics (ICCE), pp. 1–6, January 2018.https://infoscience.epfl.ch/record/261415
Evans, D.C., Fendley, M.: A multi-measure approach for connecting cognitive workload and automation. Int. J. Hum.-Comput. Stud.97, 182–189 (2017).https://www.sciencedirect.com/science/article/pii/S1071581916300623
Article Google Scholar
Hart, S.G., Staveland, L.E.: Development of NASA-TLX (task load index): results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, Advances in Psychology, vol. 52, pp. 139–183. North-Holland (1988).http://www.sciencedirect.com/science/article/pii/S0166411508623869
Logitech: Gamepad F130.http://support.logitech.com/en/product/gamepad-f310
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 4765–4774. Curran Associates, Inc. (2017).http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Momeni, N., Dell’Agnola, F., Arza, A., Atienza, D.: Real-time cognitive workload monitoring based on machine learning using physiological signals in rescue missions. Technical report, EPFL, April 2019.https://infoscience.epfl.ch/record/265352
Moray, N.: Mental Workload : Its Theory and Measurement. Published in coordination with NATO Scientific Affairs, Plenum Press (1979).https://books.google.ch/books?id=SP3lBwAAQBAJ&lr=&source=gbs_navlinks_s
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ranchet, M., Morgan, J.C., Akinwuntan, A.E., Devos, H.: Cognitive workload across the spectrum of cognitive impairments: a systematic review of physiological measures. Neurosci. Biobehav. Rev.80, 516–537 (2017).https://linkinghub.elsevier.com/retrieve/pii/S0149763416305413
Article Google Scholar
Teigen, K.H.: Yerkes-Dodson: a law for all seasons. Theory Psychol.4(4), 525–547 (1994).https://doi.org/10.1177/0959354394044004
Article Google Scholar
Toichi, M., Sugiura, T., Murai, T., Sengoku, A.: A new method of assessing cardiac autonomic function and its comparison with spectral analysis and coefficient of variation of R-R interval. J. Auton. Nerv. Syst.62(1), 79–84 (1997).http://www.sciencedirect.com/science/article/pii/S0165183896001129
Article Google Scholar
Unity Technologies, I.: Unity3D.https://unity3d.com/

Download references

Author information

Authors and Affiliations

Embedded Systems Laboratory of Swiss Federal Institute of Technology Lausanne, Lausanne, Switzerland
Fabio Dell’Agnola, Niloofar Momeni, Adriana Arza & David Atienza

Authors

Fabio Dell’Agnola
View author publications
You can also search for this author inPubMed Google Scholar
Niloofar Momeni
View author publications
You can also search for this author inPubMed Google Scholar
Adriana Arza
View author publications
You can also search for this author inPubMed Google Scholar
David Atienza
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence toFabio Dell’Agnola,Niloofar Momeni,Adriana Arza orDavid Atienza.

Editor information

Editors and Affiliations

U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, USA
Jessie Y. C. Chen
U.S. Army Combat Capabilities Development Command, Orlando, FL, USA
Gino Fragomeni

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dell’Agnola, F., Momeni, N., Arza, A., Atienza, D. (2020). Cognitive Workload Monitoring in Virtual Reality Based Rescue Missions with Drones. In: Chen, J.Y.C., Fragomeni, G. (eds) Virtual, Augmented and Mixed Reality. Design and Interaction. HCII 2020. Lecture Notes in Computer Science(), vol 12190. Springer, Cham. https://doi.org/10.1007/978-3-030-49695-1_26

Download citation

DOI:https://doi.org/10.1007/978-3-030-49695-1_26
Published:10 July 2020
Publisher Name:Springer, Cham
Print ISBN:978-3-030-49694-4
Online ISBN:978-3-030-49695-1
eBook Packages:Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Movatterモバイル変換

Cognitive Workload Monitoring in Virtual Reality Based Rescue Missions with Drones

Abstract

Similar content being viewed by others

Experimental Study on Emergency Psychophysiological and Behavioral Reactions to Coal Mining Accidents

Towards Recognition of Driver Drowsiness States by Using ECG Signals

Classifying metro drivers’ cognitive distractions during manual operations using machine learning and random forest-recursive feature elimination

Keywords

1Introduction

2Cognitive Workload Characterization and Detection Methods

2.1Simulator for Search and Rescue Mission with Drones

2.2Cognitive Workload Classification

3Experimental Study

3.1Study Protocol

3.2Participants

3.3Database

4Results

4.1Cognitive Workload Reference

4.2Binary Cognitive Workload Detection

4.33-Class Cognitive Workload Detection

5Conclusion

References

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us