The method proposed in this study comprised of different modules that are discussed separately in the following sections.
3.1. Motion Detection using Accelerometer
The first and foremost part of the proposed method is to determine the user’s walking and stationary states. It is very important to predict an accurate state as it not only improves the localization accuracy but can save smartphone battery as well. Various techniques have been utilized for the said task including machine learning classifiers like NB, Random Forest (RF), extra tree classifier and ANN, etc. [
29]. The use of ANN has been reported to produce more accurate results than that of traditional machine learning methods like NB, and RF, etc., in many research works [
14,
30,
31,
32]. However many factors make the use of ANN inappropriate for smartphone-based indoor localization. First of all, it requires a large amount of data for training and validation and smaller datasets can decrease its performance [
33,
34]. Secondly, resources required for ANN training are yet not supported by the smartphone, so, the training is carried out on a computer. Thirdly, even when trained on a computer it is not possible to deploy it on a smartphone, not for at least now. So, it requires two additional units for real-time localization; a server where the trained ANN-model is available and a channel for the communication between the user smartphone and the server. It also introduces the latency depending upon the type of channel used for communication. Similarly, other machine learning methods, although, not highly computing resources hungry, are limited by similar constraints. For this purpose, this study investigates the use of a threshold method where the accelerometer data from a smartphone is utilized for user motion detection. It is no secret that ANN and other machine learning techniques show superior performance in motion detection tasks, yet, the objective here is to evaluate, how closer a threshold method can be to the accuracy offered by machine and ANN methods.
Towards this end, four of the most widely used machine learning classifiers have been investigated like DT, CART, NB, and KNN. DT is a simple, yet powerful tool to infer decisions from a set of features. DT is comprised of the root nodes, the internal nodes, and the terminals, where nodes and edges are the representatives of features and decision, respectively [
35]. DT is favorable because it is non-parametric and computationally inexpensive. Results from DT are easy to interpret and it can tolerate the redundant attributes in the data. CART is intuitive to easily visualize the predictors and can work with numeric, binary, and categorical data. It is noise-tolerant and insensitive to missing values as it can accommodate the missing data with surrogates [
36,
37]. It recursively splits the data into groups and grows the decision tree until a user-defined threshold is satisfied. The overfitting can be avoided by making a trade-off between the number of terminal nodes and deviance. Based on the Bayesian theorem, NB can predict the probability of a particular sample to a specific class. NB is simple, yet often more effective than other sophisticated classifiers [
38,
39]. Assuming that the values of attributes are conditionally independent, it can assign the sample to a class that achieves the highest posterior probability. KNN is one of the most widely used classifiers which is simple yet efficient by its structure. Often called ‘lazy learner’, it does not make any assumptions about the data distribution. Given
k neighbors, it divides the samples into different classes by deriving boundaries between the classes. Various choices for distance estimation between data points are considered, and Euclidean Distance (ED) has been regarded as a good choice for numerical data points. A new sample is attributed to a particular class based on the voting of its neighbors [
40]. The ANN with the structure shown in
Figure 1 is used for motion detection.
ANN used in this study has three hidden layers with ten neurons each. Hidden layers are fully connected and the stochastic gradient descent method is used for optimization. A total of one hundred epochs are used for training whereas the train test split is 80–20 and the learning rate is set to 0.01. The task of ANN is to predict the samples into motion and stationary classes and feature vector is comprised of four features as shown in
Table 1.
Before calculating the features from the accelerometer data, two important processes are carried out: bias correction and noise removal. Bias is the error in the acceleration data even after the accelerometer is calibrated. It needs to be estimated and removed. For this purpose, the smartphone is put motionless on a plain surface and the acceleration in
x,
y, and
z is noted. Any difference in the acceleration from 0, 0 and 1 g (9.8 m/s
) for
x,
y, and
z acceleration needs to be adjusted. So, the bias-free acceleration can be estimated as
where
,
and
represent the corrected, measured and actual acceleration for
x axis.
Using the corrected acceleration for
x,
y, and
z, the total corrected acceleration can be calculated as
Features selected for user’s state detection are selected due to their variability when the user is either walking or standing still. Of course, it is possible to fetch derived features from accelerometer data like mean, median, and inter-quartiles, etc. however it increases the feature vector and requires increased training time and resources. Instead, this study considers only the acceleration in
x,
y,
z, and total acceleration for user motion detection. The attitude of the selected features for walking and standing motionless is shown in
Figure 2.
The same features are used for threshold-based motion detection. Two threshold scenarios are investigated and called and. The goal is to refine the threshold values to detect users’ states of motion and stationary. A two-step procedure is adopted for this purpose;
In the threshold of variances is joined through ‘AND’ while for the individual variances are joined using ‘OR’. The latter case is simple where the initially estimated individual variances are joined, while the former involves the adjustment which is done by varying the individual variances with a value. The value for is 0.01 and it is both added, as well as, subtracted from individual variances to find an optimal forx,y,z, anda variance for motion detection.
3.2. Step Detection and Heading Estimation
Step detection and heading estimation are performed using the accelerometer and gyroscope data from smartphone sensors. The bias correction for accelerometer and gyroscope is carried using the procedure given in Equation (
1). Later, a low pass filter is used to remove noise in the data before further processing.Euler angles are used to transform smartphone motion to the inertial frame. There are three kinds of rotation for a smartphone as shown in
Figure 3.
For reproducibility, this section discusses the coordinate transformation and yaw calculation as they are implemented in Android Studio 3.5. Coordinate transformation and yaw calculation require the data from three sensors: the magnetometer, accelerometer, and gyroscope (represented as M, A, and G, respectively). The sensor manager used in Android is represented as SM. First, a rotation matrix
R is obtained using M and G.
R corresponds to a 3 × 3 matrix as follows:
In Android, it is obtained using acceleration and magnetometer data as follows:
R is used to get the orientation angles, which corresponds to a 3 × 1 matrix as follows:
In Android,
O is obtained using
R as follows:
The elements of
O are
,
, and
at 2, 1, and 0 indices, respectively. However, the orientation angles and gyroscope data need to be integrated over the change in time, represented here as
. This is done in Android as follows:
Later,
,
, and
are used to calculate the Euler angles
E. Euler angles correspond to a 3 × 3 matrix and are calculated in Android as follows:
The user walking angle (
) is obtained using the Euler angles and integrated gyroscope data
calculated in Equation (
8). It is calculated using
The represents the change in user, direction and can be obtained by subtracting the previous angle (called the) from. The is replaced with every time a new calculation is made. Then, can be used with the user’s step and step length estimation to estimate their current relative position.
Step detection is carried out with the algorithm proposed in [
14], and step length estimation is done using the Weinberg model [
41]:
where
and
are the maximum and minimum acceleration in the given acceleration and
k is a threshold calculated during the calibration phase. The value of
k used in this study is 0.435. Once
and the number of steps
found in a given time
t (2 s) are calculated, user position can be estimated as:
Figure 4 and
Figure 5 show the screenshots from the Android application for the predicted path for two different geometries.
Results shown in
Figure 4 and
Figure 5 indicate only the output of the PDR module and do not portray the localization results. It is obvious from the figures that the gyroscope error is accumulated over time, which is the basic limitation of the PDR system. However, as described in
Section 3.3.2, the final position is calculated using PDR and the magnetic field data. So, the PDR data are used only for distance and heading estimation over a short period. Once the user location is finalized, PDR data are reset. It is superior to simple PDR and the gyro drift does not accumulate.