Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a multi-source satellite precipitation product fusion method and device combining sliding quadruple arrangement and random forest spatial interpolation, which can fully utilize a quadruple arrangement analysis technology to estimate the error time-varying characteristics of an original precipitation product and perform preliminary weighted average estimation, and further introduce actual measurement information through a random forest spatial interpolation model so as to further improve precipitation estimation precision.
The technical scheme is that the multi-source satellite precipitation product fusion method combining sliding quadruple arrangement and random forest spatial interpolation comprises the following steps:
(1) Acquiring multisource satellite precipitation products, multisource environment covariate data and actual measurement site daily precipitation data, resampling the multisource precipitation products and the environment covariate data, and unifying the multisource precipitation products and the environment covariate data to the same space-time resolution;
(2) Based on unified space-time resolution, acquiring a multi-source satellite precipitation product time sequence of each grid point on a watershed surface, combining any four precipitation product sequences to construct a sample combination set, estimating error variances and error covariances of the four precipitation sequences in each sample combination in each sliding window period based on a sliding quadruple arrangement analysis method, carrying out arithmetic average on the error variances and the error covariances of the same precipitation products represented in different sample combinations to obtain error variances and inter-product error covariances of the multi-source precipitation products, constructing an error matrix based on the product error variances and the inter-product error covariances, calculating weight coefficients of each product, and further calculating to obtain the satellite precipitation sequence after weighted average;
(3) On a grid with actual measurement sites, taking actual measurement site precipitation as a dependent variable, taking a satellite precipitation sequence after weighted averaging, a multi-source environment covariate sequence, surrounding actual measurement site precipitation sequences and site distances as independent variables, constructing a sample set, dividing training samples and verification samples, and training a random forest spatial interpolation model;
(4) According to the trained random forest spatial interpolation model, on the grid without the actually measured site, the satellite precipitation sequence after weighted average of the grid, the actually measured site precipitation sequence around the grid, the site distance and the environment covariate sequence are used as the input of the random forest spatial interpolation model, the precipitation sequence of the grid point is predicted, the predicted precipitation sequences of all the grid points are integrated, and finally a set of new precipitation product data is formed.
The invention also provides a multi-source satellite precipitation product fusion device combining sliding quadruple arrangement and random forest spatial interpolation, which comprises:
the data preprocessing module is used for acquiring multi-source satellite precipitation products, multi-source environment covariate data and actually-measured site daily precipitation data, resampling the multi-source precipitation products and the environment covariate data, and unifying the multi-source precipitation products and the environment covariate data to the same space-time resolution;
the multi-source precipitation product weighted average module is used for acquiring a multi-source satellite precipitation product time sequence of each grid point on a drainage basin surface based on uniform space-time resolution, combining any four precipitation product sequences to construct a sample combination set, estimating error variances and error covariances of the four precipitation sequences in each sample combination in each sliding window period based on a sliding quadruple arrangement analysis method, carrying out arithmetic average on the error variances and the error covariances of the same precipitation products represented in different sample combinations to obtain error variances and error covariances among the multi-source precipitation products, constructing an error matrix based on the product error variances and the error covariances among the products, calculating weight coefficients of each product, and further calculating to obtain a satellite precipitation sequence after weighted average;
The random forest spatial interpolation model training module is used for constructing a sample set on a grid with actual measurement sites, taking actual measurement site precipitation as a dependent variable, taking a satellite precipitation sequence after weighted averaging, a multi-source environment covariate sequence, surrounding actual measurement site precipitation sequences and site distances as independent variables, dividing training samples and verifying samples, and training a random forest spatial interpolation model;
The precipitation prediction module is used for predicting the precipitation sequence of the grid point by taking the satellite precipitation sequence, the actual measurement site precipitation sequence around the grid, the site distance and the environment covariate sequence which are weighted and averaged by the grid on the grid without actual measurement sites as the input of the random forest spatial interpolation model according to the trained random forest spatial interpolation model, integrating the predicted precipitation sequences of the grid points, and finally forming a set of new precipitation product data.
The invention also provides a computer device comprising one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processor implement the steps of the multi-source satellite precipitation product fusion method as described above in connection with sliding quadruple arrangement and random forest spatial interpolation.
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a multi-source satellite precipitation product fusion method as described above that combines sliding quadruple permutation with random forest spatial interpolation.
Compared with the prior art, the invention has the following beneficial effects:
According to the invention, a sample combination set is constructed by acquiring a multi-source precipitation product sequence of each grid point on a drainage basin surface, the error variance and the error covariance of the precipitation sequences in each sample combination set in a sliding window period are estimated based on a sliding quadruple arrangement analysis technology, an error feature matrix is constructed, the weight coefficient of each precipitation product is calculated according to the error feature matrix, and then the satellite precipitation after weighted average is calculated, so that the error time-varying feature of an original precipitation product can be mastered, and actually measured precipitation information is introduced through a random forest spatial interpolation model, so that more accurate prediction is realized. The invention overcomes the defect that the actual measurement information cannot be fused in the existing satellite precipitation product fusion field, and solves the problem that the fusion effect is influenced by the quality of the original product when the satellite precipitation product and the actual measurement precipitation product are fused by adopting machine learning, thereby having great significance for improving the fusion precision of the satellite precipitation product.
Detailed Description
The technical scheme of the invention is further described below with reference to the accompanying drawings.
As shown in fig. 1, a multi-source satellite precipitation product fusion method combining sliding quadruple arrangement and random forest spatial interpolation comprises the following steps:
Step (1), acquiring multi-source satellite precipitation products, multi-source environment covariate data and actually-measured site daily precipitation data, resampling the multi-source precipitation products and the environment covariate data, and unifying the multi-source precipitation products and the environment covariate data to the same space-time resolution;
according to the embodiment of the invention, a Xinjiang Yili river basin is taken as a research area, data including ERA5-Land and SM2RAIN, CHIPRS, CMORPH, IMERG, PERSIANN-CDR six satellite precipitation products 2010-2020 are collected, elevation data including 90m resolution provided by SRTM data are collected, gradient, slope direction and Tpi data are calculated according to the elevation data, ground surface temperature data provided by MODIS satellites, NDVI data and 10m wind speed and wind direction data provided by ERA5-Land are collected, the data have different time and space resolution, different spatial resolution is unified to 0.1 degree according to different spatial dimensions, summation calculation is carried out on the next day according to different time dimensions, the average daily average of 8 balance data provided by MODIS is considered to be 8-balance average processing, and finally the collected multi-source satellite precipitation product data and environment space-time co-variable data are unified to a dimension of 0.1 degree and a day dimension. The study area has 22 actual measurement sites, and actual measurement daily precipitation data from the 22 sites 2011-2019 of the hydrological annual survey are collected.
Step (2), a multi-source precipitation product sequence of each grid point on the river basin surface is obtained, a sample combination set is constructed, error variances of precipitation sequences in each sample combination and the change characteristics of the error covariance along with time are estimated based on a sliding quadruple arrangement analysis technology, an error feature matrix is constructed, a product weight matrix is calculated according to the error feature matrix, and weighted average is carried out according to the product weights, so that satellite precipitation data after weighted average is obtained;
In the embodiment of the invention, based on 6 satellite precipitation products P1,P2,…,P6 (representing ERA5-Land and SM2RAIN, CHIPRS, CMORPH, IMERG, PERSIANN-CDR) obtained in the step (1) after space-time resampling, each grid point on the basin surface can obtain 6 precipitation time sequences (P1,1,P1,2,…,P1,t)、(P2,1,P2,2,…,P2,t),…,(P6,1,P6,2,…,P6,t)., wherein t represents the total time length, P1,1 represents the precipitation value of the first precipitation product at the first moment, and P1,t represents the precipitation value of the first precipitation product at the t moment. For these 6 time sequences, a total of 15 sample combinations were obtained, with 4 sequences within each sample combination noted as (W1,W2,…,Wt)、(X1,X2,…,Xt)、(Y1,Y2,…,Yt)、(Z1,Z2,…,Zt).
And setting each sliding window period to be 101 days, and if precipitation obtained after weighted averaging is required to be obtained in the period of t days to t+k days, collecting multi-source precipitation products in the period of t-50 days to t+k+50 days. The first sliding window period should be 20101112-20110220, since in the example it is desirable to obtain a weighted average of precipitation over a period 20110101-20191231.
Taking 20101112-20110220 as a first sliding window period, and constructing column vectors by using (W1,W2,…,W101)、(X1,X2,…,X101)、(Y1,Y2,…,Y101)、(Z1,Z2,…,Z101), as four precipitation sequences in each sample combination in the sliding window period
Taking grid points with center coordinates of (82 DEG E,47 DEG N) as an example, W, X, Y, Z vectors for a sample combination containing ERA5-Land, SM2RAIN, CHIRPS, CMORPH within a 20101112-20110220 window period are shown in Table 1.
Table 1W, X, Y, Z vector results
Calculating covariance between vectors
Constructing a covariance matrix B and a coefficient matrix A by the calculated covariance:
according to the least square method of the quadruple permutation analysis method, an error feature matrix U (the error feature matrix is used for calculating the error variance of each vector and the error covariance between vectors) can be obtained by solving the equation a×u=b, and the solving method is as follows:
Wherein βW、βX、βY、βZ represents the magnitude conversion coefficient of W, X, Y, Z sequence relative to the unknown precipitation truth sequence, Cpp is the variance of the unknown precipitation truth sequence, CεWεW、CεXεX、CεYεY、CεZεZ represents the error variance of W, X, Y, Z sequence without magnitude conversion, and CεYεZ represents the error covariance between Y and Z without magnitude conversion, respectively.
Based on the error feature matrix U, the error variance of W, X, Y, Z over the sliding window period can be calculatedThe error covariance between Y and Z is
In the present example, taking grid points with center coordinates of (82 DEG E,47 DEG N) as an example, the error variance and error covariance calculation result of W, X, Y, Z vector of the sample combination containing ERA 5-bond and SM2RAIN, CHIRPS, CMORPH in 20101112-20110220 window period are shown in Table 2.
TABLE 2 error variance and error covariance calculation results
Arithmetically averaging the calculated error variances representing the same precipitation product within 15 sample combinations (e.g., the first sample combination comprises ERA5-Land, SM2RAIN, CHIRPS, CMORPH, the second sample combination comprises ERA5-Land, SM2RAIN, CHIRPS, IMERG, then by arithmetically averaging the error variances of ERA5-Land found within two sample combinations, the arithmetic average is considered to represent the error variances of ERA5-Land products), the error variances of each precipitation product are finally obtainedInter-product error covarianceConstructing an error matrixEach product weight coefficient λi (i= 1~m) is calculated by the following formula: Eij represents the ith row and jth column elements of the error matrix. The precipitation data after weighted averaging on day 51 (i.e., 20110101) is
In the present example, for grid points with center coordinates (82 ° E,47 ° N), the six precipitation product weights during the 20101112-20110220 window period are shown in table 3.
TABLE 3 weight coefficient for precipitation products
Then adding the initial date of the sliding window period to one day, namely taking 20101113-20110221 as a new window period, repeating the steps to calculate error variance of precipitation products and error covariance among products in the window period, constructing an error matrix to calculate weight coefficients of each product, wherein the precipitation data after weighted average of the 51 th day (namely 20110102) in the window period isThe window period is continuously moved forward until a weighted average precipitation for each time point is obtained, and finally a weighted average precipitation sequence is reached.
Step (3), on the grid with the actually measured site, taking the actually measured site precipitation as a dependent variable, taking weighted average satellite precipitation data, multi-source environment covariate data and surrounding actually measured site precipitation and site distances as independent variables, constructing a sample set, dividing a training sample and a verification sample, and training by adopting a random forest spatial interpolation model;
The random forest spatial interpolation model is based on the random forest model, and n observed values near the predicted point and distances from the observed positions to the predicted positions are regarded as additional covariates in the random forest to consider the spatial autocorrelation among the observed values. The input of the method is estimated precipitation of a predicted point precipitation product, environment covariate data, observed precipitation near the predicted point and the distance between the observed point and the predicted point, and the output is the real precipitation of the predicted point. The expression is as follows:
z(Lp)=f(P(Lp),En(Lp),z(L1),d1,z(L2),d2,...,z(Lk),dk)
Wherein z (Lp) is the actual precipitation at the predicted point Lp, P (Lp) is the estimated precipitation of the precipitation product at the predicted point Lo, and En (Lp) is the environment covariate sequence at the predicted point Lp and comprises longitude, latitude, elevation, gradient, slope direction, topography position index, surface temperature, normalized vegetation index NDVI, 10m wind speed and 10m wind direction. z (L1),z(L2),…,z(Lk) is measured precipitation at k nearest observation positions near the predicted point, and d1,d2,…,dk is distance between the k nearest observation positions near the predicted point and the predicted point.
Further, step (3) includes:
Based on the collected actually measured precipitation data, for each actually measured site j (j=1 to n, n is the total number of actually measured sites), extracting actually measured precipitation P_obsj,t of the site at each moment, and constructing a dependent variable vector;
Extracting precipitation P_mergej,t, environment variables (longitudej, latitude latj, elevationj, gradient Slopej, slope Aspectj, topography position index Tpij, surface temperature LSTj,t, normalized vegetation index NDVIj,t, 10m wind speed WINDSPEEDj,t and 10m wind direction Winddirectionj,t) corresponding to the grid of the site, and actually measured precipitation P_obs_near1t,P_obs_near2t,…,P_obs_nearkt, and corresponding site distances D_obs_near1, D_obs_near2, D_obs_ neark, which are at the same time, from the nearest k actual points near the grid;
in the embodiment of the invention, k is 10, namely, the influence of precipitation of 10 real measurement points closest to the target grid point on the target grid point is considered, and the independent variable and dependent variable matrix partial results are shown in table 4.
TABLE 4 independent and dependent variable matrices
TABLE 4 (continuous) independent and dependent variable matrix
Further, the independent variable and dependent variable data samples are divided into training samples and verification samples according to ten-fold cross verification. The model parameters of the random forest spatial interpolation model comprise node segmentation variable number mtry, minimum node size min.node.size, and observation and sample ratio sample.fraction in a decision tree, random sampling is carried out, the random forest spatial interpolation model under each parameter combination condition is trained, and the optimal parameter combination is screened according to the error square sum of the verification data set to obtain the trained random forest spatial interpolation model.
In the embodiment of the invention, the parameter results of the trained random forest spatial interpolation model are shown in Table 5.
Table 5 random forest spatial interpolation model parameters
Step (4), according to the constructed random forest spatial interpolation model, satellite precipitation data after weighted averaging of the grid, actual measurement site precipitation around the grid and site distance and environment covariate data are taken as model inputs on the grid without actual measurement site, predicted precipitation of the grid is output, predicted precipitation sequences of the grid are integrated, and finally a set of new precipitation product data is formed;
Further, the step (4) specifically includes steps of, for a grid of any site without actual measurement, extracting precipitation p_merget, environmental variables (longitude lon, latitude lat, elevation, slope orientation Aspect, topography position index Tpi, surface temperature LSTt, normalized vegetation index NDVIt, 10m wind speed WINDSPEEDt, 10m wind direction Winddirectiont) weighted-averaged in step (2) on the grid, actually measured precipitation p_obs_near1t,P_obs_near2t,…,P_obs_nearkt at the same time as the nearest k actual points near the grid point, and corresponding site distances d_obs_near1, d_obs_near2, d_obs_ neark, constructing an independent variable matrix, using the independent variable matrix as input of the random forest space interpolation model trained in step (3), outputting a predicted precipitation sequence on the grid point, and integrating the predicted precipitation sequences p_ predict of the grid points to finally form a set of new product data.
In the embodiment of the invention, the result of the predicted precipitation part output by the independent variable matrix of the site grid without actual measurement and the random forest spatial interpolation model is shown in table 6.
TABLE 6 independent variable matrix of non-measured site grid and predicted precipitation fraction results
TABLE 6 independent variable matrix of (continuous) non-measured site grid and predicted precipitation fraction results
The invention adopts a common evaluation index Pearson correlation coefficient, root mean square error RMSE, detection rate POD, false alarm rate FAR and Heidke skill score HSS of precipitation products to evaluate the fusion effect of the proposed multi-source satellite daily scale precipitation product fusion method combining a sliding quadruple analysis technology and a random forest spatial interpolation model.
In the embodiment of the invention, the calculation results of the evaluation indexes are shown in Table 7, and the results are presented in the form of average value + -standard deviation, wherein the average value is the arithmetic average of the evaluation indexes of the precipitation products at different actual measurement sites, and the standard deviation is the standard deviation of the evaluation indexes of the precipitation products at different actual measurement sites.
TABLE 7 evaluation index of original and fused products
According to the results of Table 7, it can be seen that after the precipitation products from different sources are fused by the fusion method provided by the invention, the precipitation simulation precision is obviously improved, the correlation coefficient is improved to more than 0.70, the RMSE is reduced to less than 2.5, the POD is improved to more than 0.95, the FAR is reduced to less than 0.47, and the HSS is improved to more than 0.55.
Based on the same technical concept as the method embodiment, another embodiment of the present invention further provides a multi-source satellite precipitation product fusion device combining sliding quadruple arrangement and random forest spatial interpolation, including:
the data preprocessing module is used for acquiring multi-source satellite precipitation products, multi-source environment covariate data and actually-measured site daily precipitation data, resampling the multi-source precipitation products and the environment covariate data, and unifying the multi-source precipitation products and the environment covariate data to the same space-time resolution;
the multi-source precipitation product weighted average module is used for acquiring a multi-source satellite precipitation product time sequence of each grid point on a drainage basin surface based on uniform space-time resolution, combining any four precipitation product sequences to construct a sample combination set, estimating error variances and error covariances of the four precipitation sequences in each sample combination in each sliding window period based on a sliding quadruple arrangement analysis method, carrying out arithmetic average on the error variances and the error covariances of the same precipitation products represented in different sample combinations to obtain error variances and error covariances among the multi-source precipitation products, constructing an error matrix based on the product error variances and the error covariances among the products, calculating weight coefficients of each product, and further calculating to obtain a satellite precipitation sequence after weighted average;
The random forest spatial interpolation model training module is used for constructing a sample set on a grid with actual measurement sites, taking actual measurement site precipitation as a dependent variable, taking a satellite precipitation sequence after weighted averaging, a multi-source environment covariate sequence, surrounding actual measurement site precipitation sequences and site distances as independent variables, dividing training samples and verifying samples, and training a random forest spatial interpolation model;
The precipitation prediction module is used for predicting the precipitation sequence of the grid point by taking the satellite precipitation sequence, the actual measurement site precipitation sequence around the grid, the site distance and the environment covariate sequence which are weighted and averaged by the grid on the grid without actual measurement sites as the input of the random forest spatial interpolation model according to the trained random forest spatial interpolation model, integrating the predicted precipitation sequences of the grid points, and finally forming a set of new precipitation product data.
It should be understood that, the multi-source satellite precipitation product fusion device combining sliding quadruple arrangement and random forest spatial interpolation in the embodiment of the present invention may implement all the technical solutions in the above method embodiments, and the functions of each functional module may be specifically implemented according to the methods in the above method embodiments, and the specific implementation process may refer to the relevant descriptions in the above embodiments, which are not repeated herein.
The invention also provides a computer device comprising one or more processors, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, which when executed by the processor implement the steps of the multi-source satellite precipitation product fusion method as described above in connection with sliding quadruple arrangement and random forest spatial interpolation.
The invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of a multi-source satellite precipitation product fusion method as described above that combines sliding quadruple permutation with random forest spatial interpolation.
It will be appreciated by those skilled in the art that embodiments of the invention may be provided as a method, apparatus (system), computer device, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The invention is described with reference to flow charts of methods according to embodiments of the invention. It will be understood that each flow in the flowchart, and combinations of flows in the flowchart, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows.