CNN-BiLSTM Daily Precipitation Prediction Based on Attention Mechanism

Longfei Guo

^1,*

Yunwei Pu

² and

Wenxiang Zhao

Faculty of Land and Resources Engineering, Kunming University of Science and Technology, Kunming 650093, China

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

Author to whom correspondence should be addressed.

Atmosphere2025,16(3), 333;https://doi.org/10.3390/atmos16030333

Submission received: 23 February 2025 /Revised: 8 March 2025 /Accepted: 13 March 2025 /Published: 15 March 2025

(This article belongs to the SectionAtmospheric Techniques, Instruments, and Modeling)

Downloadkeyboard_arrow_down

Browse Figures

Versions Notes

Abstract

Accurate daily precipitation forecasting is crucial for the rational utilization of water resources and the prediction of flood disasters. To address the low reliability and low prediction accuracy of existing daily precipitation prediction models based on deep learning which arise from the nonlinear and non-stationary characteristics of surface precipitation data, this paper first employs the principal component analysis (PCA) method to extract the principal components of the original data. Given that the convolutional neural network (CNN) is adept at capturing spatial dependencies, bidirectional long short-term memory (Bi-LSTM, a variant of long short-term memory (LSTM)) can capture the long-term dependence of time-series data, and the attention mechanism allows the model to focus on the more important features of the input data. A PCA-CNN-BiLSTM-Attention fusion neural network was constructed. Taking Kunming, China as the study area, the experimental results demonstrate that the Nash efficiency coefficient of the proposed model reaches 0.993, which is 15.3% and 12.6% higher than that of the CNN-LSTM and CNN-BiLSTM models, respectively. This indicates high prediction accuracy and provides an effective and feasible method for daily precipitation prediction.

Keywords:

principal component analysis;convolutional neural networks;bidirectional long short-term memory network;attention mechanisms;precipitation forecast

1. Introduction

In recent decades, the problem of heavy rainfall and flooding has become increasingly severe worldwide, due to the impacts of monsoon climate change and rapid human development, posing a significant threat to human survival and development [1,2]. Accurate daily precipitation prediction is not only a core issue of meteorological science, but also a key support for social and economic development. For example, in flood prevention and disaster mitigation, forecasting heavy rainfall 24 h in advance can help government departments initiate emergency responses promptly, thereby reducing casualties and property damage. In agricultural irrigation, accurate precipitation forecasting can optimize water distribution and boost crop yields. In urban planning, drainage systems can be effectively designed with precipitation predictions to prevent waterlogging. Additionally, for ecologically fragile areas (such as the Kunming Plateau Lake), precipitation prediction can provide a scientific basis for ecological water replenishment and biodiversity maintenance. Therefore, enhancing the accuracy of daily precipitation prediction is of great practical urgency and has extensive socio-economic value [3].

Precipitation forecasting methods can be broadly divided into two categories: process-based methods and data-driven methods. Process-based methods rely on a deep understanding of atmospheric physical processes and can offer intuitive explanations of precipitation formation mechanisms, making them particularly suitable for long-term forecasting and complex weather system simulations. Data-driven methods, on the other hand, are highly adaptable, capable of handling large-scale data and capturing complex nonlinear relationships, making them suitable for real-time and short-term forecasting tasks [4].

Currently, researchers worldwide have conducted extensive studies in precipitation prediction, ranging from the early traditional statistical model and the autoregressive moving average model (ARIMA) to the convolutional neural network (CNN) [5] and recurrent neural network (RNN) [6,7], which can capture both the temporal and spatial features of precipitation. Long short-term memory (LSTM) is a special type of RNN, and the LSTM model has a unique “gate” structure that solves the problem of gradient explosion and vanishing in the training of long time-series processes and improves the accuracy of long-term process simulation. Shen Haojun et al. [8] used LSTM to study summer precipitation in China, providing a reference for seasonal precipitation prediction. Kang et al. [9] selected an LSTM model with multiple input variables to predict daily precipitation in Jingdezhen, Jiangxi Province. Han Ying et al. [10] used the advantages of deep learning and width learning to propose an improved LSTM model (LSTM-WBLS), which provides a novel approach for precipitation prediction research. To address the low accuracy in predicting extreme precipitation values and no-rain days, Ling et al. [11] proposed a combined framework integrating support vector machines (SVMs), complete ensemble empirical mode decomposition (CEEMDAN), and bidirectional long short-term memory networks (BiLSTM) for daily precipitation prediction in the Poyang Lake Basin. In recent years, attention mechanisms have been widely applied and continuously optimized and are now utilized in popular research fields such as computer vision, speech recognition, and image recognition. Some scholars have attempted to apply attention mechanisms to precipitation forecasting to improve prediction accuracy. Cheng Yuxiang [12] proposed an attention-based BiLSTM model to analyze the weights of meteorological factors. Compared to traditional methods, this model shows superior performance in predicting precipitation influenced by multiple meteorological factors. However, these methods are limited by their relatively simple feature learning abilities, and cannot fully capture complex spatiotemporal relationships, which affects accuracy and generalization capabilities. In addition, when dealing with high-dimensional multi-source data, they may encounter dimensionality disasters that make the model difficult to train and optimize.

Therefore, combining the advantages of PCA, CNN-BiLSTM, and attention mechanisms, this paper proposes a new method for daily precipitation prediction based on a PCA-CNN-BiLSTM-Attention model, tailored to address the nonlinear and temporal characteristics of precipitation data. Among them, PCA can effectively extract the main features in the data, reduce redundant information, and improve the training efficiency and accuracy of the algorithm. Subsequently, convolutional neural networks (CNNs) can effectively capture the nonlinear local features in precipitation data. Next, the bidirectional time-dependent features of the sequence data can be extracted using a bidirectional long-term short-term memory network (BiLSTM) layer. On this basis, the features generated by the hidden layer of BiLSTM are used as inputs of the attention mechanism. The attention mechanism then distinguishes the importance of the time features extracted by BiLSTM through a weighting method, so as to reduce the interference of redundant information on the precipitation prediction results. Finally, comparative experiments prove the reliability and effectiveness of the proposed method, which can provide a reference for agriculture and water conservancy departments to make water resources management decisions and then reduce the risk of drought and flood disasters.

2. Data and Methods

2.1. Principal Component Analysis (PCA)

Principal component analysis (PCA) is a widely used dimensionality reduction method. It transforms the original variables into a set of linearly uncorrelated variables, known as principal components, through orthogonal transformation. By representing the original data with fewer principal components, PCA achieves dimensionality reduction [13]. In meteorological data analysis, PCA reduces multicollinearity among original meteorological variables, making complex datasets easier to understand and process. By applying linear transformations to historical meteorological data, PCA identifies the principal components that best capture the overall trends in the data. These principal components are typically sorted by their contribution to variance, and when the cumulative contribution rate exceeds a certain threshold, they can be used to characterize the original variables.

Consider a sample dataset

X

of dimension

m * n

, where

m

is the number of samples and each sample has

n

-dimensional features, representing

n

attributes or indicators of the dataset and mapping the data into

n

dimensions in space, as shown in Equation (1):

X = (\begin{matrix} x_{11} & \begin{matrix} x_{12} & \dots & x_{1 n} \end{matrix} \\ \begin{matrix} x_{21} \\ ⋮ \\ x_{m 1} \end{matrix} & \begin{matrix} \begin{matrix} x_{22} \\ ⋮ \\ x_{m 2} \end{matrix} & \begin{matrix} \dots \\ ⋮ \\ \dots \end{matrix} & \begin{matrix} x_{2 n} \\ ⋮ \\ x_{m n} \end{matrix} \end{matrix} \end{matrix})

(1)

The steps of PCA are as follows:

Different features in the original dataset have different dimensions, and standardized data features can unify the magnitudes of each variable into the same range, thereby reducing the negative impact of the numerical coefficients of the eigenvalues on the analysis results. The specific calculation formula is shown in Equation (2):

x_{i j}^{*} = \frac{x_{i j} - {\bar{x}}_{j}}{\sqrt{v a r (x_{j})}}, i = 1, 2, \dots, m; j = 1, 2, \dots, n

(2)

where

{\bar{x}}_{j} = \frac{1}{n} \sum_{i = 1}^{n} x_{i j}, v a r (x_{j}) = \frac{1}{n - 1} \sum_{i = 1}^{n} {(x_{i j} - {\bar{x}}_{j})}^{2}

are the mean and variance of each observation sample

x_{i}

of dataset

X

, and

x_{i j}^{*}

is the normalized value of

x_{i j}

, forming a standardized data matrix

x^{*}

The correlation coefficient matrix

R

can be calculated as shown in Equation (3):

R = \frac{x^{* T} x^{*}}{n - 1}

(3)

According to the eigenequation

| λ_{E} - R | = 0 (E is the identity matrix),

the eigenvalues of the correlation coefficient matrix

R

λ_{i}

(i = 1, 2, …, m),

λ_{1}

≥

λ_{2}

≥ … ≥

λ_{m}

≥ 0 solve

(λ_{i} E - R)

= 0 to obtain the eigenvector

μ_{1} \geq μ_{2} \geq \dots \geq μ_{m}

corresponding to

λ_{i}

, so as to obtain the eigenvalues and eigenvectors.

The variance contribution

q_{j}

and cumulative variance contribution

Q_{m}

are calculated as shown in Equations (4) and (5):

q_{j} = \frac{λ}{\sum_{j = 1}^{m} λ_{m}}

(4)

Q_{m} = \sum_{j = 1}^{m} q_{j}

(5)

The number of principal components is determined according to the cumulative variance contribution rate, and the principal components are calculated according to the impact factor component matrix.

2.2. Convolutional Neural Networks (CNNs)

CNNs have a unique three-layer architecture composed of convolutional layers, pooling layers, and fully connected layers, enabling them to efficiently extract relevant features related to precipitation and other factors. In the convolutional layer, filters are applied to extract features from the input data. These filters perform convolution operations on local regions of the input data via sliding windows, generating new feature maps. Pooling layers reduce the spatial dimensions of the feature maps while retaining the most significant features. Activation functions such as ReLU increase the nonlinear expressive power of the network. Fully connected layers integrate the features extracted by the convolutional layers and perform the final classification or regression tasks. The basic structure of CNNs is shown inFigure 1.

The convolution operation is shown in Equation (6):

x_{j}^{l} = f \times x_{t - 1} \times k_{j}^{l}, j \in J

(6)

where

x_{t - 1}

is the output value of the convolution operation of its (L − 1) input using different filters (

J

k_{j}^{l}

is the output value of each filter, and

f

is the corresponding activation function after the convolution operation.

The max-pooling operation is shown in Equation (7):

p_{j}^{l + 1} = m a x (x_{j}^{l} (t)), j \in J

(7)

where

x_{j}^{l} (t)

is the feature map extracted from the previous convolutional layer, and

t

is the size of the merged values.

2.3. Bidirectional Long Short-Term Memory Networks (BiLSTM)

Bidirectional long short-term memory (BiLSTM) networks are a specialized type of recurrent neural network (RNN) that incorporates two cell-state propagation paths: one moving forward (past to future) and one moving backward (future to past). This bidirectional architecture enables BiLSTM to capture past and future temporal information, while also exploring the relationships between historical and future temporal contexts through recursive and feedback-driven processing. By arranging neurons in opposing directions, a bidirectional training mechanism is established, leveraging both past and future data to construct the BiLSTM network.

The core LSTM model [14] treats neurons as the smallest units of information processing, with each neuron comprising three “gates” that continuously update its state. These three gates are defined as the forget gate Ft, the input gate It, and the output gate Ot. At time step t, each gate receives the input value Zt at the current time step and the output value Ht − 1 from the previous time step, collectively influencing the training process at time step t. The structure of an LSTM unit is illustrated inFigure 2.

The computation process is shown in Equation (8):

\begin{matrix} f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f}) \\ i_{t} = σ (W_{i} x_{t} + U_{i} h_{t - 1} + b_{i}) \\ \tilde{C_{t}} = t a n h (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}) \\ C_{t} = f_{t} * C_{t - 1} + i_{t} * \tilde{C_{t}} \\ o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{0}) \\ h_{t} = o_{t} * t a n h (C_{t}) \end{matrix}

(8)

where

W_{f}

W_{i} W_{c} W_{o}

are the weight matrices of the forget gate, input gate, cell-state update, and output gate, respectively.

b_{f} b_{i} b_{c} b_{o}

, and bobo are the biases of the forget gate, input gate, cell-state update, and output gate, respectively.

f_{c} i_{t}

and

o_{t}

are the outputs of the forget gate, input gate, and output gate, respectively.

σ

is the sigmoid function,

h_{t}

is the hidden state of the cell, and

U

is also the weight matrix corresponding to each gate.

Assume that the first LSTM layer processes information in chronological order, while the second LSTM layer processes information in reverse chronological order. At time stept, the hidden states of the forward LSTM and backward LSTM are defined as

\overset{\leftarrow}{h_{t}}

and

\vec{h_{t}}

, respectively. The layer-wise computation of the network is expressed in Equation (9) as follows:

\begin{matrix} \vec{h_{t}} = f (W^{(1)} x_{t} + V^{(1)} \vec{h_{t - 1}} + b^{(1)}) \\ \overset{\leftarrow}{h_{t}} = f (W^{(2)} x_{t} + V^{(2)} \vec{h_{t - 1}} + b^{(2)}) \\ h_{t} = \vec{h_{t}} ⨁ \overset{\leftarrow}{h_{t}} \\ y_{t} = g (U h_{t} + c) \end{matrix}

(9)

whereW,V,

U

are the weight matrices of the input layer, hidden layer, and output layer, respectively,b andc are the offsets,

⨁

represent the vector splicing, andg is the activation function.

2.4. Attention Mechanism

The attention mechanism is inspired by the way the human brain processes information. In deep learning, it is primarily used to assign different weights to various parts of the input sequence, determining the importance of each part in the output sequence. This allows the model to focus on the more important parts of the input sequence. The structure of an attention unit is illustrated inFigure 3.

The computation process is shown in Equation (10):

\begin{matrix} S_{t i} = V t a n h (W h_{t} + U h_{t} + b) \\ a_{t i} = \frac{\exp (S_{t i})}{\sum_{k = i}^{t} \exp (S_{t k})} \\ F = \sum_{i = 1}^{t} a_{t i} \times h_{i} \end{matrix}

(10)

where

a_{t i}

is the attention weight of the BiLSTM hidden layer output

h_{i}

for the current input,

y_{t}

is the input sequence,

h_{t}

is the hidden state corresponding to the input sequence, and V, W, U, and b are the model’s learnable parameters.

2.5. PCA-CNN-BiLSTM-Attention Model

Based on the above algorithms, a hydrological model that integrates PCA, CNN-BiLSTM, and attention mechanisms is proposed. The model’s workflow is shown inFigure 4, and the basic process is as follows:

(1): Collect and organize precipitation, temperature, and other meteorological spatial data, reconstruct the data into one-dimensional sequences, and normalize the data using the “max–min” method to enhance model training stability.
(2): Use PCA to reduce the dimensionality of the normalized meteorological data and select key principal component variables based on a predefined threshold, representing the main spatial features of precipitation and temperature.
(3): Use convolutional neural networks (CNNs) to extract temporal features and capture local feature information. The processed time series data are then input into a bidirectional long short-term memory network (BiLSTM) to learn long-term and short-term dependencies in the data.
(4): Introduce an attention mechanism at the output layer of the BiLSTM model to enhance the model’s focus on important time steps, improving prediction accuracy. The features processed by the attention mechanism are then input into a fully connected layer, where linear regression is used to predict the precipitation variable at the target time step (L). Linear regression, as a simple and effective method, maps high-dimensional features (extracted by CNN, BiLSTM, and attention mechanisms) to the scalar precipitation values at the target time step. The linear regression in the fully connected layer is consistent with the end-to-end deep learning framework, ensuring computational efficiency and seamless gradient propagation during the training process.

2.6. Study Area

Kunming is located in central Yunnan, China, with latitudes ranging from 24° N to 27° N and longitudes from 101° E to 104° E. Situated on the Yunnan–Guizhou Plateau, the region has an elevation range of 692 to 4219 m. The underlying surface primarily consists of subtropical evergreen broad-leaved forests and partial grasslands. The climate is characterized by a subtropical monsoon—warm, humid, and distinctively seasonal. The East Asian monsoon significantly influences Kunming, resulting in relatively uniform precipitation distribution. The average annual precipitation in Kunming ranges from 1000 to 1500 mm, with the majority concentrated during the summer months. High-precision precipitation modeling and forecasting are crucial for flood prevention, water resource management, and environmental protection in Kunming and its surrounding areas.

2.7. Data

The data used in this study were sourced from the China Ground Cumulative Daily Value Dataset (V3.0) released by the China Meteorological Data Network. The daily live observation meteorological data of Kunming City (station number 56778) from 1 January 1953 to 31 December 2019 were selected as the research object. A total of 24,472 days of meteorological data were collected. Among them, a small number of missing data were supplemented using linear interpolation, and some experimental data are presented inTable 1. The partially missing data are shown inTable 2.

3. Results and Discussions

3.1. Model Parameter Settings

The scope of the data is the meteorological observation data collected by the meteorological observation station in Kunming from 1953 to 2019, and the total data of 24,472 days is divided into training, validation, and test sets at a ratio of 7:2:1. To evaluate the impact of different principal component thresholds on the precipitation value simulation, the principal component thresholds are set to 85%, 90%, and 95%, respectively. In the preceding model structure, the core and training parameters are set before introducing the training set for model training. Among them, the core parameters include the input dimension (input_dim), output dimension (units), input sequence length (input_length), and random inactivation (dropout). Training parameters include epochs, learning rate, number of samples per study (batch_size), and percentage of validation set (validation_split).Table 3 lists the core parameters and their set values.

To evaluate the robustness of the CNN-BiLSTM-Attention and PCA-CNN-BiLSTM-AM models against input data uncertainty, Gaussian noise of varying intensities was introduced to the input data to observe how model performance changes with increasing noise levels. If the model performance declines steadily as the noise level increases, this indicates that the model is not overfitting and demonstrates strong robustness. Ten sets of Gaussian noise were generated, each with a mean of 0 and standard deviations ranging from 0.1 to 1.0 in increments of 0.1. For each noise level, 50 random noise vectors were generated to minimize the impact of noise randomness on model performance.

3.2. Evaluation Indicators

r,α,β, the RMSE, MAE, and NSE [15], were selected to systematically evaluate the accuracy of the model. The root mean square error (RMSE) was calculated as the square root of the mean of the squares of the difference between the predicted and true values. Lower values indicate that the model’s predictions are closer to the actual values, thus reflecting higher accuracy. The mean absolute error (MAE) is calculated as the average of the absolute difference between the predicted value and the true value. The Nash efficiency coefficient (NSE) is a widely used metric in hydrology and climatology to evaluate fluid dynamics models. It normalizes the residual signal to the range of the measured signal, with values ranging from −1 to 1. When the NSE is close to 1, it indicates that the model prediction is highly accurate. A value close to 0 suggests poor prediction performance, while a negative value indicates that the model is distorted. The correlation coefficient (r) assesses the strength and direction of the linear relationship between the simulated and measured values. The mean deviation (α) calculates the difference between the average of the simulated values and the average of the measured values. The standard deviation (β) evaluates the deviation of the standard deviation of the simulated value from the standard deviation of the measured value. It is calculated as follows in Equation (11):

E_{R M S E} = \sqrt{\frac{1}{n} \sum_{i}^{n} {(y_{o, i} - y_{p, i})}^{2}} E_{M A E} = \frac{1}{n} \sum_{i = 1 `}^{n} | (y_{o, i} - y_{p, i}) | E_{N S} = 1 - \frac{\sum_{i = 1}^{n} {(y_{o, i} - y_{p, i})}^{2}}{\sum_{i = 1}^{n} {(y_{o, i} - \bar{y_{o}})}^{2}} r = \frac{\sum_{i = 1}^{n} (y_{p, i} - \bar{y_{p}}) (y_{o, i} - \bar{y_{o}})}{\sqrt{\sum_{i = 1}^{n} {(y_{o, i} - \bar{y_{o}})}^{2} \sum_{i = 1}^{n} {(y_{p, i} - \bar{y_{p}})}^{2}}} α = \frac{σ_{p}}{σ_{o}} β = \frac{y_{p, i} - y_{o, i}}{y_{o, i}}

(11)

where

n

is the length of the simulation period,

y_{o, i}

refers to the observed value of precipitation at time

i

y_{p, i}

refers to the simulated value of precipitation at time

i

\bar{y_{o}}

represents the mean of the observed value of precipitation,

\bar{y_{p}}

represents the mean value of the simulated precipitation, and

σ_{p}

and

σ_{o}

are the standard values of predicted and simulated precipitation, respectively.

3.3. Evaluation of the PCA-CNN-BiLSTM-Attention Model

To verify the effectiveness of the proposed model, the precipitation sequences from the test set and the predicted values from various models were visualized. The comparison is shown inFigure 5 which displays the predicted and actual precipitation values from 1 January 2018 to 31 December 2019 (730 days). The testing period of 730 days can fully cover the seasonal fluctuations of meteorological data. In the visualization of time-series forecasting, a two-year cycle neither causes trends to become vague due to excessive duration nor loses periodic characteristics due to insufficient duration.

As shown inFigure 5, the PCA-CNN-BiLSTM-Attention model simulates and measures precipitation at Kunming station when the principal component thresholds are set to 85%, 90%, and 95%, respectively.Table 4 indicates that the Nash efficiency coefficient (

E_{n s e}

) values for these thresholds are 0.954, 0.962, and 0.993, respectively. The root mean square error (RMSE) values are 1.585, 0.879, and 0.694, while the mean absolute error (MAE) values are 0.685, 0.365, and 0.334. The correlation coefficients (r) are 0.98, 0.98, and 0.99, demonstrating strong consistency between simulated and observed precipitation. Additionally, the mean bias results reveal a slight underestimation (less than 10%) across all three thresholds, indicating minimal model bias and accurate precipitation simulation. In conclusion, the PCA-CNN-BiLSTM-Attention model exhibits excellent performance in daily precipitation forecasting. As the threshold increases, the model’s prediction accuracy improves significantly, as reflected in the optimization of all metrics. This underscores the model’s capability to effectively handle complex spatiotemporal data characteristics, achieving superior performance in the Kunming region.

3.4. Comparison of Results from Different Models

Figure 6 andTable 5 show the simulation effects and evaluation index results of the comparison models LSTM, CNN-LSTM, CNN-BiLSTM, and CNN-BiLSTM-Attention. FromTable 5, it can be seen that the NSE of the LSTM, CNN-LSTM, CNN-BiLSTM, and CNN-BiLSTM-Attention models was 0.834, 0.866, 0.887, and 0.942, respectively. The RMSE was 3.276, 3.143, 2.893, and 2.124, respectively. The MAE was 2.246, 2.087, 1.047, and 0.797, respectively. The results show that the four models can simulate daily precipitation effectively, with the CNN-BiLSTM-Attention model performing the best and the LSTM model performing the worst. Moreover, the precipitation deviation of the PCA-CNN-BiLSTM-Attention model was 4%, which was much lower than that of the LSTM (16%), CNN-LSTM (12%), CNN-BiLSTM (15%), and CNN-BiLSTM-Attention (10%) models.

As can be seen fromFigure 5 andFigure 6, since precipitation is seasonal and cyclical, BiLSTM captures the time dependencies in the data more comprehensively than LSTM, which helps improve the understanding of seasonality and periodicity. By introducing attention mechanisms, the prediction performance of the model has been significantly improved, improving its accuracy, adaptability, and interpretability when processing precipitation data. This helps effectively mine and utilize the key information in the data, so as to improve the effect of precipitation prediction. For the prediction of extreme values of precipitation data, the use of PCA for precipitation prediction, especially in the face of extreme values of precipitation data, can improve the robustness, accuracy, and generalization ability of the model and more effectively predict the extreme cases of precipitation data.

The proposed PCA-CNN-BiLSTM-AM model achieves the best performance, with the lowest RMSE and MAE values, indicating that the CNN-BiLSTM model, enhanced by the attention mechanism and PCA, can more accurately predict daily precipitation, especially when the principal component threshold is set to 95%.

3.5. Robustness Evaluation of the Mode

To evaluate the robustness of the model, Gaussian noise is introduced. By adding Gaussian noise of varying intensities to the original data, the model’s performance under various uncertainties is observed.Figure 7 shows the simulation results of the CNN-BiLSTM-Attention and PCA-CNN-BiLSTM-Attention models under different standard deviations of Gaussian noise. As the PCA threshold increases (85%, 90%, 95%), the model’s robustness to data perturbations improves, indicating that more effective feature selection and information compression allow the model to maintain high accuracy under data perturbations. As the standard deviation of the Gaussian noise increases, the Nash efficiency coefficient (NSE) decreases significantly, indicating a gradual decline in simulation performance. However, the decline in simulation performance is smooth, showing that the PCA-CNN-BiLSTM-Attention model does not overfit the principal component variables used as inputs to the CNN-BiLSTM-Attention model, demonstrating good robustness. The combination of PCA, CNN, BiLSTM, and attention mechanisms shows superior performance in handling complex data, with multi-level feature-learning mechanisms allowing the model to effectively process the impact of Gaussian noise. What is more, the attention mechanism maintains the ability to identify key features even when more information is provided, further enhancing the model’s effectiveness under different levels of perturbation.

4. Conclusions

In this study, principal component analysis (PCA), bidirectional long short-term memory networks (BiLSTMs), and attention mechanisms were integrated for daily precipitation forecasting. The method effectively extracts the spatial characteristics of meteorological elements and comprehensively captures the contextual information in the time series. A systematic review of the PCA-CNN-BiLSTM-Attention model in the Kunming study area showed that, compared with the basic model, the model achieved a Nash efficiency coefficient of 0.993, with the RMSE and MAE reduced by 67.31% and 58.12%, respectively. These results verify that the model has strong applicability and robustness.

The model can be integrated into the urban emergency management system, predicting heavy rain events 24 h in advance and assisting relevant departments in the timely activation of emergency plans. For example, by combining the topographical data of Kunming City with historical flood records, the model can further optimize warning thresholds, dynamically assess flooding risks under different precipitation intensities, and provide a scientific basis for personnel evacuation and resource allocation.

The model in this study still has the following limitations: first, it is sensitive to data quality, and the imputation of missing values may introduce bias. Second, the prediction stability decreases in rare extreme weather events. Third, the computational complexity of the model is high, and GPU acceleration is required to meet real-time requirements. This study explores the advantageous fusion of deep learning, principal component analysis, and attention mechanism in precipitation prediction and provides a new feasible scheme for precipitation prediction research. Future work will integrate multi-source data, such as terrain elevation and satellite remote sensing, and explore the design of lightweight models to improve generalization capabilities.

Author Contributions

Conceptualization, L.G. and Y.P.; methodology, L.G. and Y.P.; software, L.G.; validation, L.G.; formal analysis, L.G. and Y.P.; investigation, L.G. and Y.P.; resources, L.G. and Y.P.; data curation, L.G.; writing—original draft preparation, L.G.; writing—review and editing, L.G., Y.P., and W.Z.; visualization, L.G.; supervision, Y.P.; project administration, L.G.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request from the authors.

Acknowledgments

I would like to thank Yunwei Pu and Wenxiang Zhao for their valuable contributions to this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, Z.; Chen, H.; Ren, M.; Cheng, T. Progress on disaster mechanism and risk assessment of urban flood/waterlogging disasters in China.Adv. Water Sci.2020,31, 713–724. [Google Scholar] [CrossRef]
Wang, H.P.; Zhang, B.; Liu, Z.H.; He, J.Y.; Luo, J. Chaos theory-based comparative study on monthly rainfall characteristics in Wuhan and Yichang during recent 60years.J. Nat. Disasters2012,21, 111–118. [Google Scholar] [CrossRef]
Xie, H.L.; Peng, G.H.; Guo, M.C. Application of an Improved SVR Model in Annual Precipitation Prediction.Math. Pract. Theory2017,47, 154–161. [Google Scholar]
Yang, Q.; Qin, L.; Gao, P. Annual Precipitation Prediction in the Economic Zone of the Northern Slope of Tianshan Mountains Based on the EEMD-LSTM Model.Arid Zone Res.2021,38, 1235–1243. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition.Proc. IEEE1998,86, 2278–2324. [Google Scholar] [CrossRef]
Yang, S.; Yang, D.; Chen, J.; Santisirisomboon, J.; Lu, W.; Zhao, B. A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data.J. Hydrol.2020,590, 125206. [Google Scholar] [CrossRef]
Kim, T.; Shin, J.-y.; Kim, H.; Heo, J.H. Ensemble-Based Neural Network Modeling for Hydrologic Forecasts: Addressing Uncertainty in the Model Structure and Input Variable Selection.Water Resour. Res.2020,56, e2019WR026262. [Google Scholar] [CrossRef]
Shen, H.J.; Luo, Y.; Zhao, Z.C. Research on Summer Precipitation Prediction in China Based on LSTM Network.Adv. Clim. Change Res.2020,16, 263–275. [Google Scholar]
Kang, J.; Wang, H.; Yuan, F.; Wang, Z.; Huang, J.; Qiu, T. Prediction of Precipitation Based on Recurrent Neural Networks in Jingdezhen, Jiangxi Province, China.Atmosphere2020,11, 246. [Google Scholar] [CrossRef]
Han, Y.; Guan, J.; Cao, Y.; Luo, J. Application of LSTM-WBLS Model in Daily Precipitation Prediction.J. Nanjing Univ. Inf. Sci. Technol. (Nat. Sci. Ed.)2023,15, 180–186. [Google Scholar]
Ling, M.; Xiao, L.Y.; Zhao, J. Daily Precipitation Prediction Based on the SVM-CEEMDAN-BiLSTM Model.People’s Pearl River2023,44, 61–68. [Google Scholar]
Cheng, Y.X.; Xiao, L.Y.; Wang, P.G. Monthly Precipitation Prediction Based on the Hybrid Attention-BiLSTM Model.People’s Pearl River2024,45, 73–81. [Google Scholar]
Zhu, C.M.; Wu, H.J.; Song, X.Y. Application of SVR model based on multi-factors combination in streamflow forecasting of Songhua River basin.Water Resour. Power2021,39, 12–15, 41. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory.Neural Comput.1997,9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles.J. Hydrol.1970,10, 282–290. [Google Scholar] [CrossRef]

Figure 1. Structure of CNNs.

Figure 2. LSTM network structure.

Figure 3. Structure of attention.

Figure 4. Flowchart of the PCA-CNN-BiLSTM-Attention model.

Figure 5. Prediction effect of PCA-CNN-BiLSTM-Attention model under different principal component thresholds.

Figure 6. Predict the effect.

Figure 7. Robustness evaluation results.

Table 1. Partial experimental data.

	TEM/°C	RHU/%	WIN/m/s	EVP/mm	PRE/mm	SSD/h	PRS/hPa
1 January 1953	7.4	77	1.2	9.7	0	3	81,130
2 January 1953	7.5	80	3.9	9.4	0	4	81,030
3 January 1953	8.5	81	3.4	8.3	0	3.8	81,040
…	…	…	…	…	…	…	…
30 December 2019	7.3	66	1.7	9.3	0	2.7	81,530
31 December 2019	8.4	73	1.8	6.6	0	2.2	81,790

Notes: … Represents the number of days omitted, TEM stands for air temperature, RHU stands for Relative Humidity, WIN stands for Wind Speed, EVP stands for evaporation, PRE stands for precipitation, SSD stands for Sunshine Time, PRS stands for Barometric Pressure.

Table 2. Partially missing data.

	TEM/°C	RHU/%	WIN/m/s	EVP/mm	PRE/mm	SSD/h	PRS/hPa
7 January 1968	8.8	68	2	9.7	0.01	3	81,070
8 January 1968	9.6	78	5.3		0		81,230
9 January 1968	10.6	71	2.5	9.3	0		81,230
…	…	…	…	…	…	…	…
25 October 1968	17.2	86	0.7	6.1	1.8		81,090
26 October 1968	10.1	96	1.3	0	11.2		81,590

Table 3. Core parameters and their set values.

	The Name of the Parameter	Parameter Value
Core parameters	input_dim	6
	units	1
	input_length	10
	dropout	0.3
Training parameters	epochs	200
	learning rat	0.001
	batch_size	64

Table 4. The evaluation index results of PCA-CNN-BiLSTM-Attention model under different principal component thresholds.

Threshold/%	NSE	RMSE	MAE	β	r	α
85	0.954	1.585	0.685	−0.04	0.98	0.91
90	0.962	0.879	0.365	−0.08	0.98	0.92
95	0.993	0.694	0.334	−0.07	0.99	0.99

Table 5. The results of different models were evaluated.

Model	NSE	RMSE	MAE	β	r	α
LSTM	0.834	3.276	2.246	−0.16	0.92	0.94
CNN-LSTM	0.866	3.143	2.087	−0.12	0.93	0.96
CNN-BiLSTM	0.887	2.893	1.047	−0.15	0.94	0.98
CNN-BiLSTM-Attention	0.942	2.1237	0.797	−0.10	0.97	0.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, L.; Pu, Y.; Zhao, W. CNN-BiLSTM Daily Precipitation Prediction Based on Attention Mechanism.Atmosphere2025,16, 333. https://doi.org/10.3390/atmos16030333

AMA Style

Guo L, Pu Y, Zhao W. CNN-BiLSTM Daily Precipitation Prediction Based on Attention Mechanism.Atmosphere. 2025; 16(3):333. https://doi.org/10.3390/atmos16030333

Chicago/Turabian Style

Guo, Longfei, Yunwei Pu, and Wenxiang Zhao. 2025. "CNN-BiLSTM Daily Precipitation Prediction Based on Attention Mechanism"Atmosphere 16, no. 3: 333. https://doi.org/10.3390/atmos16030333

APA Style

Guo, L., Pu, Y., & Zhao, W. (2025). CNN-BiLSTM Daily Precipitation Prediction Based on Attention Mechanism.Atmosphere,16(3), 333. https://doi.org/10.3390/atmos16030333

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further detailshere.

Article Metrics

Article Access Statistics

For more information on the journal statistics, clickhere.

Multiple requests from the same IP address are counted as one view.

Movatterモバイル変換

Article Menu

CNN-BiLSTM Daily Precipitation Prediction Based on Attention Mechanism

Abstract

1. Introduction

2. Data and Methods

2.1. Principal Component Analysis (PCA)

2.2. Convolutional Neural Networks (CNNs)

2.3. Bidirectional Long Short-Term Memory Networks (BiLSTM)

2.4. Attention Mechanism

2.5. PCA-CNN-BiLSTM-Attention Model

2.6. Study Area

2.7. Data

3. Results and Discussions

3.1. Model Parameter Settings

3.2. Evaluation Indicators

3.3. Evaluation of the PCA-CNN-BiLSTM-Attention Model

3.4. Comparison of Results from Different Models

3.5. Robustness Evaluation of the Mode

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI