- Fabio Dell’Agnola ORCID:orcid.org/0000-0002-3556-257810,
- Niloofar Momeni ORCID:orcid.org/0000-0002-9435-977210,
- Adriana Arza ORCID:orcid.org/0000-0002-4190-787810 &
- …
- David Atienza ORCID:orcid.org/0000-0001-9536-494710
Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 12190))
Included in the following conference series:
3244Accesses
Abstract
The use of drones in search and rescue (SAR) missions can be very cognitively demanding. Since high levels of cognitive workload can negatively affect human performance, there is a risk of compromising the mission and leading to failure with catastrophic outcomes. Therefore, cognitive workload monitoring is the key to prevent the rescuers from taking dangerous decisions. Due to the difficulties of gathering data during real SAR missions, we rely on virtual reality. In this work, we use a simulator to induce three levels of cognitive workload related to SAR missions with drones. To detect cognitive workload, we extract features from different physiological signals, such as electrocardiogram, respiration, skin temperature, and photoplethysmography. We propose a recursive feature elimination method that combines the use of both an eXtreme Gradient Boosting (XGBoost) algorithm and the SHapley Additive exPlanations (SHAP) score to select the more representative features. Moreover, we address both a binary and a three-class detection approaches. To this aim, we investigate the use of different machine-learning algorithms, such as XGBoost, random forest, decision tree, k-nearest neighbors, logistic regression, linear discriminant analysis, gaussian naïve bayes, and support vector machine. Our results show that on an unseen test set extracted from 24 volunteers, an XGBoost with 24 features for discrimination reaches an accuracy of 80.2% and 62.9% in order to detect two and three levels of cognitive workload, respectively. Finally, our results are open the doors to a fine grained cognitive workload detection in the field of SAR missions.
This work has been partially supported by the NCCR Robotics through the Symbiotic Drone project, and by the ONR-G through the Award Grant No. N62909-17-1-2006.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Cognitive Workload Monitoring
- Physiological signals
- Machine learning
- Search and rescue missions
- Simulator
- Drones
1Introduction
The use of drones in search and rescue (SAR) missions can be very cognitively demanding. Rescuers often operate in extreme conditions, under time pressure, and dealing with the scarcity of human resources. Moreover, they have to find strategies to face all types of unexpected events that a SAR mission may hide. Since high levels of cognitive workload can negatively affect human performance [10,13], there is a risk of compromising the mission and leading to failure with catastrophic outcomes. Therefore, a Cognitive Workload Monitoring (CWM) system that detects elevate workload levels could be used to notify the rescues and prevent them from taking dangerous decisions.
Nowadays, there are three main methods to measure cognitive workload; that is, subjective surveys, performance metrics, and assessing the human physiological response [3,6]. Due to the needs of a continuous monitoring and the unpredictability of SAR missions, neither surveys nor performance metrics can be used for cognitive workload monitoring of rescues. In contrast, the cognitive workload monitoring from physiological signals, which can be acquired with wearable sensors, is suitable for this application.
Continuous workload monitoring is a challenging task because is a multidimensional problem. Many factors affect cognitive workload; that is mental, physical, and temporal demands, the overall performance, the frustration level, and the effort [5]. The perceived workload is both environmental and subjective dependent, it changes among individuals according to their learning skills and their ability to address and perform a particular task. Despite that, state-of-the-art studies that target cognitive workload detection from physiological signals showed promising results [3,4,9,12].
The objective of this work is to detect cognitive workload levels from physiological signals in the field of SAR missions with drones. However, due to the unpredictable conditions and the difficulties of having a cognitive workload reference value during real SAR missions, the data collection becomes really difficult. Therefore, we rely on Virtual Reality (VR) to emulate immersive SAR mission. Our main contributions are as follows:
We induce three levels of cognitive workload using a VR simulator for SAR missions with drones.
We detect high and low levels of cognitive workload with an accuracy, precision and recall of 80.2%, 79.6%, 71.7%, respectively on an unseen dataset, which prove the generalization of our model.
We explore a three-class cognitive workload detection, thus identifying high, medium, and low levels of cognitive workload with an accuracy of 62.9%, which is a promising results considering the complex identification of cognitive workload levels generated in our virtual SAR mission.
The rest of the paper is organized as follows. Section 2 describes the methods applied to induce and detect different levels of cognitive workload. Section 3 describes the setup of the experiment and Sect. 4 reports our results. Finally, in Sect. 5 we draw the main conclusions of this work.
2Cognitive Workload Characterization and Detection Methods
The general idea of the method applied for estimating cognitive workload, including sensor placement, is show in Fig. 1. A VR based simulator is used to induce different levels of cognitive workload by changing the difficulty of the tasks [4]. Then, the cognitive workload is detected from physiological signals by following a standard methodology, which includes signals acquisition, preprocessing, feature extraction, and classification. Finally, an elicitation technique based on a multidimensional assessment tool, the NASA Task Load Index (NASA-TLX) [6], and a questionnaire to evaluate the difficulty of the tasks are used to label the different levels of induced cognitive workload.
Cognitive workload (WL) estimation from physiological signals, such as Electrocardiogram (ECG), Respiration (RSP), Impedance Cardiogram (ICG), Skin Temperature (SKT), Photoplethysmography (PPG), and Electrodermal Activity (EDA).
2.1Simulator for Search and Rescue Mission with Drones
The proposed VR based simulator emulates a simplified SAR mission, where the pilot needs to fly a drone through a pathway and map a damage situation of a disaster area. The simulator was implemented with Unity3D [15] and used to track the workload influence on SAR missions with drones in [4]. In the simulator, the flight pathway is shown by 90 waypoints (black rings) distributed every 20 m along a randomly generated trajectory over a village. The speed of the drone is fixed at 6 m/s for all the tasks. The damage situation is represented by cubes that randomly appear over the flying pathway, as shown in Fig. 2. The colors of the cubes are chosen according to the regulation of the Swiss Firefighters; that is, yellow to indicate rescue situations, red for fire, blue for water damages, and green for the accidents.
Simulator for search and rescue mission with drones. Overview of different conditions: flying and mapping (F3M), flying (F), and training (T). (Color figure online)
To induce different levels of workload, both flying and mapping activities are combined yielding in the following tasks:
Training (T): A training sequence is proposed to let the user familiarize himself with the simulator. The training sequence has a combination of both flying and mapping activities. The user is asked to fly as close as possible through the center of the waypoints. In addition, the user needs to press the button of the controller relative to the color of the objects that are randomly displayed. The number of objects to be mapped are 60 per session, i.e., 15 per color.
Baseline (B): To set a physiological baseline avoiding as much as possible the effect of uncontrollable variables, a passive task with the same framework as for the entire experiment is presented. This task is a flying sequence controlled by an auto-pilot. During this task, the user just needs to watch the sequence presented to him without any other additional activity.
Flying (F): Flying task is considered as a medium level of cognitive workload. During this task, the subject solely needs to fly the drone as close as possible to the center of the waypoints.
Mapping 3 object (3M): This task is also considered as a medium level of cognitive workload. During this task, the subject has to identify three objects of different color that appear at the same time on the screen and has to press the button of the controller corresponding to the color of the object. The number of objects to be mapped are 240 per session, i.e., 60 per color.
Flying and Mapping 3 objects (F3M): This task is considered as a high level of the cognitive workload. It is the same as training, including the same combination of both the flying and mapping activities but with a more demanding mapping activity. The objects displayed simultaneously on the screen are three as in 3M task, and not only one as in the training phase.
2.2Cognitive Workload Classification
The proposed methodology to build a model for CWM has three main steps. The first step is the acquisition of different signals, such as Electrocardiogram (ECG), Respiration (RSP), Impedance Cardiogram (ICG), Skin Temperature (SKT), pulse wave through Photoplethysmography (PPG), and Electrodermal Activity (EDA), as selected in [4,9].
The second step is the preprocessing and feature extraction, where we filter the physiological signals and we extract several features to capture the functionality of the human physiology as we did in [1,9]. For the feature extraction we use a 1-minute sliding window without overlap. Then, we apply a Recursive Feature Elimination with Cross Validation (RFECV) to select the most important physiological features that better characterize different levels of cognitive workload. Our RFECV is based on the eXtreme Gradient Boosting (XGBoost) algorithm and uses the SHapley Additive exPlanations (SHAP) framework for a more detailed prediction interpretation [8].
Finally, the third and last step is the cognitive workload classification. In this step, we first investigate several machine learning algorithms to select the model that best fits our classification problem. To this aim, we compare XGBoost, Random Forest (RF), Decision Tree (DT), k-Nearest Neighbors (kNN), Logistic Regression, Linear Discriminant Analysis (LDA), Gaussian Naïve Bayes (GNB), and Support Vector Machine (SVM).
Both RFECV and the machine learning algorithms are trained and validated on the training set using a shuffled Leave-P-Groups-Out Cross-Validation (LPGOCV) with five iterations considering each time 20% of the groups as validation set. Each group is composed of all observations recorded in one day that belong to one subject. If not otherwise specified, the machine learning algorithms are trained and tested with [11] using default parameters.
Thus, we have applied the same procedure to address both binary and three-class classification problems. Finally, to prove the generalizability of our model, a final test is done on an unseen test set, which includes data from 30% of the groups, that was initially putted apart for this purpose only.
3Experimental Study
To build a database and investigate the cognitive workload characterization and detection from physiological signals, we perform an experimental study applied a designed protocol as in Fig. 3 using the VR simulator for SAR mission with drones. To control the simulator it was used a Gamepad from Logitech [7]. We recorded the physiological signals with the Biopac acquisition set [2], which is CE certified for medical monitoring.
3.1Study Protocol
Our study protocol starts with a setup phase, where we provide information about the experiment and we place the sensors. Additionally, we propose a training sequence to let the participant get familiar with the simulator. Then, we proceed with two phases of data collection, as shown in Fig. 3.
Protocol of the experiment, which includes different conditions, such as baseline (B), mapping activity (3M), flying activity (F), flying and mapping performed simultaneously (F3M), and filling a questionnaire (Q) to evaluate the NASA-TLX.
Phase 1 is composed of four different tasks (i.e., B, F, 3M, and F3M), where the order of F, 3M, and F3M is randomized to avoid any bias related to the order of these activities. Before each resting period, the subjects were asked to rate the perceived level of cognitive workload by evaluating the NASA-TLX. The time duration for each task in Phase 1 is five minutes.
Phase 2 consists of a concatenated sequence of the tasks presented in Phase 1 (i.e., B, F, 3M, and F3M). As in Phase 1, the order of F3M, 3M, F is randomized. However, in Phase 2 there is no resting period between the tasks. Phase 2 is executed twice in a row, and the NASA-TLX is evaluated only at the end of each sequence. The tasks duration in Phase 2 is limited to three minutes. Except the data collected from the second repetition of Phase 2 that is kept apart for testing our final machine-learning algorithm, the rest of the data is selected only for training the machine-learning algorithms.
Participants were asked not to talk, and to avoid as much as possible any kind of unnecessary movements during the main tasks, but they were free to rest and move otherwise.
3.2Participants
In our study participated 24 volunteers (6 females and 18 males), aged between 21 and 39 years old (\(27.7 \pm 4.8\)). All but one participated twice in two different days. The participants were healthy, free of any cardiac abnormalities and were receiving no medical treatment. The ethical approval for this study was obtained from the Cantonal Ethics Commissions for Human Research Vaud and Geneva; namely, ethical approval application number PB2017-00295.
3.3Database
From the 47 experimental sessions performed, we build our datasets based on the workload reported by the participants in the NASA-TLX questionnaire. Values higher than 55% and lower than 5% are labeled as low and high workload, respectively. Following this criteria, 259 observations are labeled as low workload and 249 are labeled as high, which makes a dataset of 508 observations. By including the medium level of workload, we extend our dataset to 748 observations.
Both datasets are split into two subsets of 70% and 30% of groups of observations for both training and testing, respectively. Thus, the test set includes all the observation of 14 groups randomly selected from the second day, while the training set includes all the remaining groups of observations acquired the first day.
4Results
Our results include the cognitive workload perceived by the volunteers and used as reference. Moreover, we provide results of a binary CWD and results of a three-class CWD.
4.1Cognitive Workload Reference
Both the tasks’ difficulty perceived by the participants and the cognitive workload level reported with the NASA-TLX procedure after each task are shown in Fig. 4. Each box includes 144 data points, which were averaged across all the trials.
Perceived workload per task. Different conditions: baseline (B); mapping (3M); flying (F); and flying and mapping (F3M).
Both cognitive workload and tasks’ difficulty show an increasing trend in the order of B, 3M, F and F3M. Then, a two-tail t-tests yielded a significant difference between most of the tasks (p-value < 0.001, n = 47), except for the cognitive workload between 3M and F where the difference is not significance. The significance difference of the difficulty between tasks 3M and F is limited to a p-value < 0.05. A significant Pearson’s correlations between cognitive workload and tasks’ difficulty is observed (p-value < 0.001, n = 47). The high correlation confirms the validity of the assumption that cognitive workload increases with the difficulty of the tasks.
4.2Binary Cognitive Workload Detection
Following our methodology to develop a model for cognitive workload monitoring (see Sect. 2) we first process the physiological signals. In particular we consider in this study the RSP, ECG, PPG and SKT from where we extracted 156 features as in [9]. Then, we performed the feature selection step.
The result of the RFECV based on SHAP values and using a XGBoost model is shown in Fig. 5. The score increases by adding features until it reaches its maximum with 24 features. Then, it decreases and reaches a plateau if more than 45 features are used. Finally, we selected 24 features to train an XGBoost classifier, which reaches a LPGOCV accuracy of 79.5%. The 24 most important physiological features characterizing high levels of cognitive workload level of the participants are shown in Table 1.
The comparison of different classification algorithms for the binary classification between low and high workload levels are summarized in Table 2. In particular, we report both training and LPGOCV accuracy before and after feature selection.
As expected, considering the large amount of features compared with the limited size of our dataset, the models trained with all the features show an important gap between train and LPGOCV accuracy, which is a clear sign of overfit. This overfitting problem was addressed by using simplest models (e.g., linear instead of non-linear models) and models with less parameters (i.e., lower capacity). Indeed, the use of LDA drastically reduces the gap between training and LPGOCV accuracy, especially when a limited amount of features are used. Although the training data is not sufficient to find an optimum, the highest LPGOCV accuracy is given by XGBoost (79.5%). Thus, the results provided next are based on XGBoost only.
Finally, in order to minimize the errors, we fine tuned the decision threshold of our model on cross-validation. A plot of precision, recall, and accuracy as a function of different decision thresholds is shown in Fig. 6.
Although the higher CV accuracy is reached with a threshold around 0.5, which is the default decision threshold in binary classification, the recall is poor. However, by choosing a threshold equal to 0.27 (where precision almost equals recall), we have an equal error rate and the accuracy is still acceptable. A detailed comparison between the use of a threshold at 0.5 and one at 0.27 is reported in Table 3.
The results on an unseen test set of the workload detection based on XGBoost with 24 selected features and a threshold at 0.27 that shows the generalization power of the final workload detection model are reported in Table 4.
Although results obtained in CV suggest the use of an XGBoost with the 24 selected features and a threshold of 0.27, as a comparison we also report in Table 4 the results of an XGBoost before and after feature selection and a decision threshold at 0.5. As on the test set, an increase of accuracy exists and, as a result, a reduction of the difference between precision and recall after applying both feature selection and threshold tuning. Overall, these results show that our model generalizes well for cognitive workload detection.
4.33-Class Cognitive Workload Detection
As explained in Sect. 2, we also explore the three-class detection problem using the same procedure that we applied for the binary problem. The applied RFECV technique with XGBoost algorithm based on SHAP values also selected 24 features with a maximum accuracy of 54.7%, which is still clearly better than random (33%). The selected physiological features characterizing low, medium, and high levels of cognitive workload are reported in Table 1.
From the comparison of the different classification methods reported in Table 5, we can see that XGBoost is again slightly better than the others. The important difference between training and CV accuracy clearly indicates an overfitting problem. Acquiring more data will probably solve this problem and increase the performance of some models, such XGBoost, random forest, and decision tree. However, our results suggest to select again the XGBoost with 24 features to address our three-class problem.
Finally, on an unseen test set, the cognitive workload detection based on the 3-class XGBoost with 24 selected features shows an accuracy and both averaged precision and recall of 62.9%.
5Conclusion
The goal of this work was to detect cognitive workload levels from physiological signals in the field of SAR missions with drone. Although the volunteers were not exposed to the same stressful conditions as they would be in the case of a real SAR mission, with augmented reality we were able to inducing different level of cognitive workload that are related to SAR mission with drones. From the collected dataset we compared different machine-learning algorithms and we found that an XGBoost with only 24 features is enough to address both a binary and a three-class problem. The two class XGBoost was able to detect high and low levels of cognitive workload with an accuracy of 80.2%. The three-class XGBoost was able to detect high, medium, and low levels of cognitive workload with an accuracy of 62.9%. Considering the little difference between the levels of induced workload our results are promising and open the doors to a fine grained cognitive workload detection in the field of SAR missions.
References
Arza, A., et al.: Measuring acute stress response through physiological signals: towards a quantitative assessment of stress. Med. Biol. Eng. Comput.57(1), 271–287 (2018).https://doi.org/10.1007/s11517-018-1879-z
Biopac: MP160 Data Acquisition Systems.https://www.biopac.com/product/mp150-data-acquisition-systems/
Cain, B.: A Review of the Mental Workload Literature. Defence Research and Development Toronto, Canada (2007).http://www.dtic.mil/dtic/tr/fulltext/u2/a474193.pdf
Dell’Agnola, F., Cammoun, L., Atienza, D.: Physiological characterization of need for assistance in rescue missions with drones. In: IEEE International Conference on Consumer Electronics (ICCE), pp. 1–6, January 2018.https://infoscience.epfl.ch/record/261415
Evans, D.C., Fendley, M.: A multi-measure approach for connecting cognitive workload and automation. Int. J. Hum.-Comput. Stud.97, 182–189 (2017).https://www.sciencedirect.com/science/article/pii/S1071581916300623
Hart, S.G., Staveland, L.E.: Development of NASA-TLX (task load index): results of empirical and theoretical research. In: Hancock, P.A., Meshkati, N. (eds.) Human Mental Workload, Advances in Psychology, vol. 52, pp. 139–183. North-Holland (1988).http://www.sciencedirect.com/science/article/pii/S0166411508623869
Logitech: Gamepad F130.http://support.logitech.com/en/product/gamepad-f310
Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 4765–4774. Curran Associates, Inc. (2017).http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Momeni, N., Dell’Agnola, F., Arza, A., Atienza, D.: Real-time cognitive workload monitoring based on machine learning using physiological signals in rescue missions. Technical report, EPFL, April 2019.https://infoscience.epfl.ch/record/265352
Moray, N.: Mental Workload : Its Theory and Measurement. Published in coordination with NATO Scientific Affairs, Plenum Press (1979).https://books.google.ch/books?id=SP3lBwAAQBAJ&lr=&source=gbs_navlinks_s
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res.12, 2825–2830 (2011)
Ranchet, M., Morgan, J.C., Akinwuntan, A.E., Devos, H.: Cognitive workload across the spectrum of cognitive impairments: a systematic review of physiological measures. Neurosci. Biobehav. Rev.80, 516–537 (2017).https://linkinghub.elsevier.com/retrieve/pii/S0149763416305413
Teigen, K.H.: Yerkes-Dodson: a law for all seasons. Theory Psychol.4(4), 525–547 (1994).https://doi.org/10.1177/0959354394044004
Toichi, M., Sugiura, T., Murai, T., Sengoku, A.: A new method of assessing cardiac autonomic function and its comparison with spectral analysis and coefficient of variation of R-R interval. J. Auton. Nerv. Syst.62(1), 79–84 (1997).http://www.sciencedirect.com/science/article/pii/S0165183896001129
Unity Technologies, I.: Unity3D.https://unity3d.com/
Author information
Authors and Affiliations
Embedded Systems Laboratory of Swiss Federal Institute of Technology Lausanne, Lausanne, Switzerland
Fabio Dell’Agnola, Niloofar Momeni, Adriana Arza & David Atienza
- Fabio Dell’Agnola
You can also search for this author inPubMed Google Scholar
- Niloofar Momeni
You can also search for this author inPubMed Google Scholar
- Adriana Arza
You can also search for this author inPubMed Google Scholar
- David Atienza
You can also search for this author inPubMed Google Scholar
Corresponding authors
Correspondence toFabio Dell’Agnola,Niloofar Momeni,Adriana Arza orDavid Atienza.
Editor information
Editors and Affiliations
U.S. Army Research Laboratory, Aberdeen Proving Ground, MD, USA
Jessie Y. C. Chen
U.S. Army Combat Capabilities Development Command, Orlando, FL, USA
Gino Fragomeni
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Dell’Agnola, F., Momeni, N., Arza, A., Atienza, D. (2020). Cognitive Workload Monitoring in Virtual Reality Based Rescue Missions with Drones. In: Chen, J.Y.C., Fragomeni, G. (eds) Virtual, Augmented and Mixed Reality. Design and Interaction. HCII 2020. Lecture Notes in Computer Science(), vol 12190. Springer, Cham. https://doi.org/10.1007/978-3-030-49695-1_26
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-030-49694-4
Online ISBN:978-3-030-49695-1
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative