Movatterモバイル変換


[0]ホーム

URL:


CN120764333A - Stud low cycle fatigue life prediction method - Google Patents

Stud low cycle fatigue life prediction method

Info

Publication number
CN120764333A
CN120764333ACN202510838505.3ACN202510838505ACN120764333ACN 120764333 ACN120764333 ACN 120764333ACN 202510838505 ACN202510838505 ACN 202510838505ACN 120764333 ACN120764333 ACN 120764333A
Authority
CN
China
Prior art keywords
model
feature
prediction
sample
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202510838505.3A
Other languages
Chinese (zh)
Inventor
金增选
周霄天
王晓臣
肖华裕
童俊辉
张腾达
郑海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Urban Infrastructure Construction And Development Center
Original Assignee
Ningbo Urban Infrastructure Construction And Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Urban Infrastructure Construction And Development CenterfiledCriticalNingbo Urban Infrastructure Construction And Development Center
Priority to CN202510838505.3ApriorityCriticalpatent/CN120764333A/en
Publication of CN120764333ApublicationCriticalpatent/CN120764333A/en
Pendinglegal-statusCriticalCurrent

Links

Landscapes

Abstract

Translated fromChinese

本发明公开了一种栓钉低周疲劳寿命预测方法,通过机器学习算法,解决传统试验耗时长、劳动强度大、预测精度低等问题,节约试验成本,缩短研发周期;XGBoost和RF集成,显著提升预测精度,同时仍保持在小样本量的准确性;SHAP的出现满足了对模型解释的需求,解决由于模型的复杂性和黑箱性质使得理解其决策过程变得困难的问题;通过精准预测,避免过度设计,节约材料从而实现降低能耗,同时对高风险栓钉提前预警,避免灾难性事故的发生。

The present invention discloses a method for predicting the low-cycle fatigue life of bolts. Through machine learning algorithms, it solves the problems of traditional tests such as long time consumption, high labor intensity, and low prediction accuracy, saves test costs, and shortens the R&D cycle. The integration of XGBoost and RF significantly improves prediction accuracy while still maintaining accuracy in small sample sizes. The emergence of SHAP meets the demand for model interpretation and solves the problem that the complexity and black box nature of the model make it difficult to understand its decision-making process. Through accurate prediction, over-design is avoided, materials are saved, and energy consumption is reduced. At the same time, early warning is provided for high-risk bolts to avoid catastrophic accidents.

Description

Stud low cycle fatigue life prediction method
Technical Field
The invention relates to the technical field of bridge engineering, in particular to a method for predicting the low cycle fatigue life of a stud.
Background
The stud is used as a key force transmission element of the steel-concrete composite structure, and the fatigue performance of the stud directly influences the service safety and service life of the bridge structure. Under the dynamic stress effects of vehicle load, earthquake and the like, the stud is always subjected to high-frequency and high-amplitude cyclic stress, and the low-cycle fatigue failure is a key cause for damage and even collapse of a bridge structure. The traditional stud fatigue life prediction method mainly relies on three technical means of push-out tests, finite element numerical simulation and an empirical formula method.
The experimental test method adopts a high-frequency fatigue testing machine to apply cyclic load to the stud test piece, monitors crack initiation and propagation processes through strain gauges, and finally counts failure cycle times. Although the method has higher reliability, the method has the problems of high single test cost (about tens of thousands yuan), long period (usually more than 1 month) and difficulty in simulating the coupling action of complex environments. The finite element simulation method establishes a stud refined three-dimensional model based on software such as ABAQUS and the like, and combines a Miner linear accumulated damage theory to predict service life. The method is limited by the sensitivity of model parameters (such as more than 20% of prediction deviation caused by friction coefficient errors), and the calculation takes tens of hours, so that the engineering rapid evaluation requirement cannot be met. The AASHTO standard formula method adopts an empirical correction coefficient method to estimate the fatigue life of the stud, and the simplified formula does not consider key factors such as material microstructure degradation, load spectrum non-stationarity and the like, so that the prediction error is larger in heavy-load traffic or earthquake-fatigue coupling scenes.
Therefore, a method for predicting the low cycle fatigue life of a peg is provided.
Disclosure of Invention
The invention aims to provide a stud low cycle fatigue life prediction method for solving the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme that the method for predicting the low cycle fatigue life of the stud comprises the following steps:
S1, establishing an original data set, wherein the original data set comprises a plurality of input features and a short-cycle fatigue life of the stud, the plurality of input features are independent variables, and the short-cycle fatigue life of the stud is an output quantity;
S2, carrying out data standardization processing and feature screening on a plurality of input features in an original data set to obtain an original data set after feature screening;
s3, dividing the original data set subjected to feature screening into a training set and a testing set;
S4, constructing a multi-model prediction model based on a linear regression model, a tree model integration model, a gradient lifting model, a probability modeling model and a kernel function model by using the training set;
S5, using the test set, searching optimal parameters by adopting a five-fold cross validation and grid search method, and testing the reliability of the multi-model prediction model until the prediction precision and the training time reach preset conditions to obtain optimal model integration;
S6, dynamically adjusting each model weight in the optimal model integration, fusing each model weight, and constructing a weighted voting mechanism to obtain an integrated model;
s7, predicting the low cycle fatigue life of the target stud by using the integrated model;
s8, introducing a SHAP interpretation tool to explain the prediction result of the integrated model in detail.
According to the above technical scheme, in the step S1, the plurality of input features include three types, namely a load parameter, a material parameter and a geometric parameter;
The load parameters comprise a loading value P, a stress amplitude delta tau, a maximum stress taumax, a minimum stress taumin and an average stress taumean;
The material parameters comprise concrete strength fc and peg tensile strength fu;
The geometric parameter includes the peg diameter d.
According to the above technical solution, in S2,
The method for carrying out data standardization processing on a plurality of input features adopts Z-score standardization to eliminate dimension differences, and the formula is as follows:
Wherein mu is the mean value of the original data, sigma is the standard value of the original data, xnorm is the standardized data, the mean value of xnorm is 0, and the standard deviation is 1;
The feature screening method for the input features comprises the step of screening feature variables through Spearman rank correlation coefficients, wherein the formula is as follows:
wherein di is the rank difference of the ith sample on the two variables, n is the total number of samples, ρ is the sign of the correlation coefficient, and represents the strength and direction of rank correlation between the two variables, and the value ranges from-1 to 1.
According to the above technical solution, in S4, the specific steps for constructing the multi-model prediction model are as follows:
S4.1, training a linear regression model:
The linear mapping relation between the load parameter and the low cycle fatigue life of the stud is established through a least square method, and the model expression of the linear regression model is as follows:
Nf=β01Δτ+β2τmax
Wherein Nf is fatigue life, beta0 is intercept term, beta1、β2 is regression coefficient of stress amplitude delta tau and maximum stress taumax, and co is random error term;
Quantizing the linear contributions of a plurality of input features through betai coefficients, providing a datum reference for a subsequent nonlinear model, and analyzing a solution form:
Wherein X is a design matrix, y is an observation lifetime vector,A coefficient vector estimated for least squares;
S4.2, training a tree model integration model:
The decision tree regression model is characterized in that a recursive bipartite strategy is adopted, stress amplitude delta tau and peg tensile strength fu are used as key splitting characteristics, a single decision tree is constructed according to a nonlinear relation of the stress amplitude delta tau and the peg tensile strength fu, zero-order nonlinear fitting is realized through local mean prediction, and the splitting threshold optimization problem has the analytical formula:
where j is the current split feature, s is the split threshold, RL,RR is the left and right sub-regions after splitting,As the average value of the service lives of the samples in the left and right subareas, yi is the real service life of the ith sample;
The random forest model integrates a plurality of decision trees, reduces variance by randomly selecting a characteristic subspace and a sample subset, and enhances generalization capability, and the expression is as follows:
Wherein, theFor inputting an average predicted value of x, B is the number of decision trees, Tb is a predicted function of a B-th tree, and x is an input feature vector;
Introducing extra randomness on feature splitting points and sample sampling to improve model diversity, wherein the expression is as follows:
ξj~U(min(xj,max(xj))
Where ζj represents the random splitting threshold of feature j and u represents a uniform distribution;
s4.3, training a gradient lifting model:
And (3) gradient lifting the tree model, namely training a plurality of weak learners through iteration, gradually optimizing a prediction result, fitting the model of each step with the residual error of the previous step, carrying out residual error calculation through the gradient direction of a loss function, correcting the prediction deviation, and finally combining all weak models into a strong prediction model by weighting, wherein the formulas are as follows:
Loss function:
Wherein L is a loss function, and is used for measuring the difference between a predicted value and a true value of a model, Y is a true label of a sample, F (x) is a predicted value of the model on input x, and x is an input feature vector;
residual calculation (step t):
Where ri,t is the residual of the ith sample at step t; Is the partial derivative of the loss function with respect to the model, which represents the change rate of the loss function along with the predicted value, Ft-1(xi is the predicted value of the model of the previous step (the t-1 step) on the sample xi, and xi is the input feature vector of the i-th sample;
Model update (learning rate η): Ft(x)=Ft-1(x)+η·ht (x)
Wherein Ft (x) is the predicted value of the model of the t step on the input x, eta is the learning rate, and ht (x) is the predicted value of the t decision tree on the input x;
introducing regularization (L1/L2) and second-order Taylor expansion to prevent overfitting, supporting parallel calculation and missing value processing, wherein the formula is as follows:
Where L is the overall objective function of the model, L (yi,F(xi)) is the loss function value of the ith sample, the prediction error of the model F (xi) to the true value yi is measured, yi is the true label or target value of the ith sample, F (xi) is the predicted value of the model to the ith sample xi, xi is the input feature vector of the ith sample, Ω (Fk) is the regularization term of the kth tree for controlling the complexity of the model, and Fk is the kth decision tree;
s4.4, training a probability modeling model:
constructing a kernel function mapping nonlinear relation to provide prediction uncertainty assessment, wherein the formula is as follows:
Wherein k (xi,xj) is the kernel function value between the inputs xi and xj, measuring the similarity between them, sigma2 is the signal variance, controlling the magnitude of the function, l is the length scale, controlling the smoothness of the function variation;
The Bayes regression model is used for outputting confidence intervals of the stud low-cycle fatigue life prediction based on posterior distribution of the Bayes theorem quantization model parameters, and the formulas are as follows:
posterior distribution, assuming Gaussian a priori β -N (0, Σ)
Wherein P (beta|y, X) refers to posterior probability distribution of model parameter beta under the condition of given data y and X, beta is parameter vector of model, y is observed target vector, X is input characteristic matrix, P (y|X, beta) refers to probability of observed target value y under the condition of given characteristic vector X and parameter beta, P (beta) refers to prior probability distribution of parameter beta, P (y|X) refers to probability of observed target value y under the condition of given characteristic X;
N (0, sigma) is a Gaussian distribution with a mean value of 0 and a covariance matrix of Sigma;
Is the mean value ofThe covariance matrix is the Gaussian distribution of A-1; is the posterior mean, typically the maximum posterior estimate of the parameter, A-1 is the inverse of the posterior covariance matrix;
s4.5, training a kernel function model:
Mapping data to a high-dimensional space through a kernel function, and fitting the data in an epsilon-sensitive band, wherein the formula is as follows:
problem optimization:
Wherein w is a model weight vector, C is a regularization parameter, and is used for controlling punishment degree of training error, ζi,Is a relaxation variable which represents the training error of the upper and lower boundaries of sample i, respectively;
Constraint conditions:
Wherein yi is the true target value for the ith sample; b is a bias term of the model, epsilon is the width of the insensitive band, and represents that the error in the epsilon range does not account for loss;
the predictive equation:
wherein f (x) is a predicted value of the model, alphai,Is a Lagrangian multiplier corresponding to the upper and lower constraint boundaries of sample i, K (xi, x) is a kernel function.
According to the above technical scheme, the specific steps of S5 are as follows:
using the test set, defining a hyper-parameter space to generate all parameter combinations through grid search, training and evaluating each model through five-fold cross validation (K=5), searching the optimal parameters, and testing the reliability of the multi-model prediction model until the prediction precision and the training time reach preset conditions, so as to obtain the optimal model integration;
Assume a hyper-parameter set:
the search space is the cartesian product of all the hyper-parameters combined:
Θ=Θ1×Θ2×…×Θk
The optimal parameter θ* is chosen such that the model evaluation index F (θ) is maximized:
θ*=argmaxF(θ)(θ∈Θ)
Wherein, theta1、Θ2、Θk respectively represents the value sets of the first, second and Kth super parameters, theta is the searching space of the super parameters and comprises all possible super parameter combinations, theta* is the optimal super parameter combination, F (theta) is the evaluation index of the model under the super parameter combination theta;
the method is characterized in that the optimal model integration is evaluated by taking Root Mean Square Error (RMSE), average absolute percentage error (MAPE) and decision coefficient R2 as evaluation indexes, and the formula is as follows:
Where yi is the actual value,Is the predicted value, n is the sample size,Is the average of the actual values.
According to the above technical scheme, the specific steps of S6 are as follows:
S6.1, generating a plurality of decision trees through a random forest model, and extracting feature importance weights, wherein the formula is as follows:
Wherein the splitting characteristic in the Sf node is a set of f, and delta immunity (S) is the reduction of the purity of the split node S;
the feature importance is then normalized, normalizing the feature importance weight wf to [0,1], with the formula:
wherein, importance (f) is the Importance of the feature f, T is the total number of decision trees, the split feature in the Sf node is the set of f, and DeltaImmunity (S) is the reduction of the impure degree after the split of the node S;
wf is the Importance weight of the normalized feature F, F is the total feature number, and Importance (j) represents the unnormalized Importance of the j-th feature;
S6.2, carrying out gradient lifting on the input features based on an extreme gradient lifting tree model, and dynamically adjusting sample weights;
firstly, sample weight calculation is carried out, the weight Si of the sample i is determined by the weighted sum of all the characteristics, and the formula is as follows:
Wherein Si is the weight of the ith sample, wf is the normalized weight of the feature f, from the calculation result of S6.1, uf is the mean value of the feature f, and |xi,ff | is the degree of deviation of the measurement sample i on the feature f;
Secondly, introducing sample weight Si into an objective function of the extreme gradient lifting tree model, wherein the formula is as follows:
Wherein L is a model overall objective function after introducing sample weight;
s6.3, fusing the feature importance weight of the random forest model with the sample weight of the extreme gradient lifting tree model, constructing a weighted voting mechanism, generating a final prediction result, and obtaining an integrated model, wherein the fusion formula is as follows:
Wherein, theIs the initial predictor of the random forest model for the input feature x,The method is characterized in that the method is a residual error predicted by an extreme gradient lifting tree model, and alpha is a weight super parameter used for controlling the contribution ratio of the initial prediction of a random forest model and the residual error correction of the extreme gradient lifting tree model.
According to the above technical scheme, the specific steps of S8 are as follows:
s8.1, adopting KERNEL SHAP algorithm in SHAP framework, approximating Shapley value by weighted linear regression, calculating marginal contribution of each input feature to predicted value, and defining kernel function as follows for the stud life prediction model:
Where N is the set of all input features, S is the subset of features that does not contain input feature i,Is the SHAP value of the input feature i, and f (S) is the predicted output of the model given the feature subset S;
S8.2, calculating the importance of the input features, and calculating the average contribution degree of each input feature to life prediction based on the global interpretation of SHAP values;
S8.3, local decision analysis is carried out, and the prediction result is comprehensively analyzed by analyzing the comprehensive influence of a plurality of input features.
Compared with the prior art, the invention has the following beneficial effects:
(1) The machine learning algorithm solves the problems of long time consumption, high labor intensity, low prediction precision and the like of the traditional test, saves the test cost and shortens the research and development period;
(2) XGBoost and RF integration, significantly improving the prediction accuracy while still maintaining accuracy at small sample sizes;
(3) The appearance of SHAP meets the requirement of model interpretation, and solves the problem that the decision process is difficult to understand due to the complexity and black box property of the model;
(4) Through accurate prediction, avoid excessive design, thereby material saving realizes reducing the energy consumption, early warning in advance to high risk peg simultaneously, avoids the emergence of catastrophic accident.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a technical flow chart of the present invention;
FIG. 2 is a correlation thermodynamic diagram of the present invention;
FIG. 3 is a graph of integrated model comparison results of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-3, the invention provides a method for predicting the low cycle fatigue life of a peg, comprising the following steps:
S1, establishing an original data set, wherein the original data set comprises a plurality of input features and a short-cycle fatigue life of the stud, the plurality of input features are independent variables, and the short-cycle fatigue life of the stud is an output quantity;
The input characteristics comprise three types, namely a load parameter, a material parameter and a geometric parameter, wherein the load parameter comprises a load value P, a stress amplitude delta tau, a maximum stress taumax, a minimum stress taumin and an average stress taumean, the material parameter comprises concrete strength fc and stud tensile strength fu, and the geometric parameter comprises stud diameter d;
S2, carrying out data standardization processing and feature screening on a plurality of input features in the original data set to obtain an original data set subjected to feature screening;
the method for carrying out data standardization processing on a plurality of input features adopts Z-score standardization to eliminate dimension differences, and the formula is as follows:
Wherein mu is the mean value of the original data, sigma is the standard value of the original data, xnorm is the standardized data, the mean value of xnorm is 0, and the standard deviation is 1;
The feature screening method for the input features is to screen feature variables through Spearman rank correlation coefficients, wherein Spearman rank correlation coefficients are non-parametric statistical methods used for measuring monotone correlation between two variables, and the formula is as follows:
Wherein di is the rank difference of the ith sample on two variables, n is the total number of samples, ρ is the sign of the correlation coefficient, the strength and direction of rank correlation between the two variables are represented, and the value ranges from-1 to 1;
as shown in fig. 2, determining the tensile strength of the peg, the maximum stress value of the individual peg and the stress amplitude as core input characteristic variables according to the degree of correlation;
s3, dividing the original data set subjected to feature screening into a training set and a testing set;
S4, constructing a multi-model prediction model based on a linear regression model, a tree model integration model, a gradient lifting model, a probability modeling model and a kernel function model by using a training set, and systematically capturing complex mapping relations between the fatigue life of the stud and stress parameters and material characteristics through the synergistic effect of five methods of the linear regression model, the tree model integration model, the gradient lifting model, the probability modeling model and the kernel function model. Each model forms complementary advantages in terms of prediction accuracy, robustness and interpretability;
s4.1 training a linear regression model (LR):
The linear mapping relation between load parameters (such as stress amplitude and maximum stress) and the low-cycle fatigue life of the stud is established through a least square method, and the model expression of the linear regression model is as follows:
Nf=β01Δτ+β2τmax
Wherein Nf is fatigue life, beta0 is intercept term, beta1、β2 is regression coefficient of stress amplitude delta tau and maximum stress taumax, and co is random error term;
quantizing the linear contributions of a plurality of input features through betai coefficients, providing a base reference for a subsequent nonlinear model, and analyzing a solution form:
Wherein X is a design matrix, y is an observation lifetime vector,A coefficient vector estimated for least squares;
S4.2, training a tree model integration model:
A decision tree regression model (DTR) is used for constructing a single decision tree based on the nonlinear relation between stress amplitude and material strength and providing local feature importance assessment, wherein a recursive bipartite strategy is adopted, the stress amplitude delta tau and the stud tensile strength fu are used as key splitting features, the nonlinear relation is used for constructing the single decision tree, zero-order nonlinear fitting is realized through local mean value prediction, and the resolution formula of the splitting threshold optimization problem is as follows:
where j is the current split feature, s is the split threshold, RL,RR is the left and right sub-regions after splitting,As the average value of the service lives of the samples in the left and right subareas, yi is the real service life of the ith sample;
the random forest model (RF) integrates a plurality of decision trees, reduces variance by randomly selecting a feature subspace and a sample subset, and enhances generalization capability, and the expression is as follows:
Wherein, theFor inputting an average predicted value of x, B is the number of decision trees, Tb is a predicted function of a B-th tree, and x is an input feature vector;
An extreme random tree model (Extra Trees) is characterized in that Extra randomness is introduced to characteristic splitting points and sample samples, so that model diversity is improved, and the expression is as follows:
ξj~U(min(xj,max(xj))
Where ζj represents the random splitting threshold of feature j and u represents a uniform distribution;
s4.3, training a gradient lifting model:
The gradient lifting tree model (GBDT) is characterized in that a plurality of weak learners are trained in an iterative mode, prediction results are optimized gradually, the model of each step fits the residual error of the previous step, residual error calculation is carried out through the gradient direction of a loss function, prediction deviation is corrected, and finally all weak models are weighted and combined into a strong prediction model, wherein the formula is as follows:
Loss function:
Wherein L is a loss function, and is used for measuring the difference between a predicted value and a true value of a model, Y is a true label of a sample, F (x) is a predicted value of the model on input x, and x is an input feature vector;
residual calculation (step t):
Where ri,t is the residual of the ith sample at step t; Is the partial derivative of the loss function with respect to the model, which represents the change rate of the loss function along with the predicted value, Ft-1(xi is the predicted value of the model of the previous step (the t-1 step) on the sample xi, and xi is the input feature vector of the i-th sample;
Model update (learning rate η): Ft(x)=Ft-1(x)+η·ht (x)
Wherein Ft (x) is the predicted value of the model of the t step on the input x, eta is the learning rate, and ht (x) is the predicted value of the t decision tree on the input x;
the extreme gradient lifting tree model (XGBoost) introduces regularization (L1/L2), second order Taylor expansion to prevent overfitting, and supports parallel computation and missing value processing, and the formula is:
Where L is the overall objective function of the model, L (yi,F(xi)) is the loss function value of the ith sample, the prediction error of the model F (xi) to the true value yi is measured, yi is the true label or target value of the ith sample, F (xi) is the predicted value of the model to the ith sample xi, xi is the input feature vector of the ith sample, Ω (Fk) is the regularization term of the kth tree for controlling the complexity of the model, and Fk is the kth decision tree;
s4.4, training a probability modeling model:
gaussian Process Regression (GPR), which is a Bayesian She Sifei parameter regression method, is used for regression and uncertainty estimation by constructing a kernel function mapping nonlinear relation and providing prediction uncertainty estimation, and has the following formula:
Wherein k (xi,xj) is the kernel function value between the inputs xi and xj, measuring the similarity between them, sigma2 is the signal variance, controlling the magnitude of the function, l is the length scale, controlling the smoothness of the function variation;
The Bayes regression model is used for outputting confidence intervals of the stud low-cycle fatigue life prediction based on posterior distribution of the Bayes theorem quantization model parameters, and the formulas are as follows:
posterior distribution, assuming Gaussian a priori β -N (0, Σ)
Wherein P (beta|y, X) refers to posterior probability distribution of model parameter beta under the condition of given data y and X, beta is parameter vector of model, y is observed target vector, X is input characteristic matrix, P (y|x, beta) refers to probability of observed target value y under the condition of given characteristic vector X and parameter beta, P (beta) refers to prior probability distribution of parameter beta, P (y|X) refers to probability of observed target value y under the condition of given characteristic X;
N (0, sigma) is a Gaussian distribution with a mean value of 0 and a covariance matrix of Sigma;
Is the mean value ofThe covariance matrix is the Gaussian distribution of A-1; is the posterior mean, typically the maximum posterior estimate of the parameter, A-1 is the inverse of the posterior covariance matrix;
s4.5, training a kernel function model, wherein a polynomial kernel function (poly) is adopted as a kernel function of a support vector machine (SVR), flexible nonlinear fitting capability is provided, and the method is suitable for the change of data under different scales:
Mapping data to a high-dimensional space through a kernel function, and fitting the data in an epsilon-sensitive band, wherein the formula is as follows:
problem optimization:
Wherein w is a model weight vector, C is a regularization parameter, and is used for controlling punishment degree of training error, ζi,Is a relaxation variable which represents the training error of the upper and lower boundaries of sample i, respectively;
Constraint conditions:
Wherein yi is the true target value for the ith sample; b is a bias term of the model, epsilon is the width of the insensitive band, and represents that the error in the epsilon range does not account for loss;
the predictive equation:
wherein f (x) is a predicted value of the model, alphai,Is a Lagrangian multiplier corresponding to the upper and lower constraint boundaries of sample i, K (xi, x) is a kernel function;
s5, using a test set, defining a hyper-parameter space to generate all parameter combinations through grid search, training and evaluating each model through five-fold cross validation (K=5), searching optimal parameters, and testing the reliability of the multi-model prediction model until the prediction precision and training time reach preset conditions, so as to obtain optimal model integration;
Assume a hyper-parameter set:
the search space is the cartesian product of all the hyper-parameters combined:
Θ=Θ1×Θ2×…×Θk
The optimal parameter θ* is chosen such that the model evaluation index F (θ) is maximized:
θ*=arg max F(θ)(θ∈Θ)
Wherein Θ1、Θ2、Θk represents the value set of the first, second and Kth super parameters respectively, Θ is the searching space of the super parameters and comprises all possible super parameter combinations, θ* is the optimal super parameter combination, and F (θ) is the evaluation index of the model under the super parameter combination θ;
The method is characterized in that the Root Mean Square Error (RMSE), the average absolute percentage error (MAPE) and the decision coefficient R2 are used as evaluation indexes to evaluate the integration of the optimal model, and the formula is as follows:
Where yi is the actual value,Is the predicted value, n is the sample size,Is the average of the actual values;
The results of the operation are shown in the following table:
As can be seen intuitively from the table, the RMSE value of RF is minimum, the R2 value is maximum, the MAPE value of XGBoost is minimum, overall, the performance of RF, XGBoost and DTR models is relatively good;
S6, dynamically adjusting each model weight in the optimal model integration, fusing each model weight, and constructing a weighted voting mechanism to obtain an integrated model;
Aiming at the problems of high overfitting risk, insufficient generalization capability, low feature utilization rate and the like of the existing single model in a complex data scene, a hybrid integration framework for fusing the advantages of a random forest model and an extreme gradient lifting tree model is provided, and the sample sampling weight of the extreme gradient lifting tree model is adjusted according to the feature importance of the random forest;
S6.1, generating a plurality of decision trees through a random forest model, and extracting feature importance weights, wherein the formula is as follows:
Wherein the splitting characteristic in the Sf node is a set of f, and delta immunity (S) is the reduction of the purity of the split node S;
the feature importance is then normalized, normalizing the feature importance weight wf to [0,1], with the formula:
wherein, importance (f) is the Importance of the feature f, T is the total number of decision trees, the split feature in the Sf node is the set of f, and DeltaImmunity (S) is the reduction of the impure degree after the split of the node S;
wf is the Importance weight of the normalized feature F, F is the total feature number, and Importance (j) represents the unnormalized Importance of the j-th feature;
S6.2, carrying out gradient lifting on the input features based on an extreme gradient lifting tree model, and dynamically adjusting sample weights;
firstly, sample weight calculation is carried out, the weight Si of the sample i is determined by the weighted sum of all the characteristics, and the formula is as follows:
Wherein Si is the weight of the ith sample, wf is the normalized weight of the feature f, from the calculation result of S6.1, uf is the mean value of the feature f, and |xi,ff | is the degree of deviation of the measurement sample i on the feature f;
Secondly, introducing sample weight Si into an objective function of the extreme gradient lifting tree model, wherein the formula is as follows:
Wherein L is a model overall objective function after introducing sample weight;
s6.3, fusing the feature importance weight of the random forest model with the sample weight of the extreme gradient lifting tree model, constructing a weighted voting mechanism, generating a final prediction result, and obtaining an integrated model, wherein the fusion formula is as follows:
Wherein, theIs the initial predictor of the random forest model for the input feature x,The method is characterized in that the method is a residual error predicted by an extreme gradient lifting tree model, and alpha is a weight super parameter used for controlling the contribution ratio of the initial prediction of a random forest model and the residual error correction of the extreme gradient lifting tree model;
As shown in fig. 3, the integrated model is plotted against the predicted results of XGBoost and RF, respectively;
compared with XGBoost model and RF model, the predicted point of the integrated model is more concentrated near the ideal predicted line, three performance indexes are all improved, and specific values are shown in the following table:
From the above table, it can be seen that the integrated model achieved a 14.83% reduction and a 7.32% improvement in RMSE and R2, respectively, compared to the XGBoost model, and a 8.91%, 6.57% reduction and a 2.66% improvement in MAPE, RMSE, R2, respectively, compared to the RF model.
S7, predicting the low cycle fatigue life of the target stud by using the integrated model;
s8, introducing an interpretability analysis framework based on SHAPLEY ADDITIVE exPlanations (SHAP) aiming at the black box decision problem caused by complexity of a machine learning model in the prior art, and realizing visualization and physical mechanism mapping of a prediction result by quantifying marginal contribution of characteristics to model output;
introducing an SHAP interpretation tool to explain the prediction result of the integrated model in detail;
s8.1, adopting KERNEL SHAP algorithm in SHAP framework, approximating Shapley value by weighted linear regression, calculating marginal contribution of each input feature to predicted value, and defining kernel function as follows for the stud life prediction model:
Where N is the set of all input features, S is the subset of features that does not contain input feature i,Is the SHAP value of the input feature i, and f (S) is the predicted output of the model given the feature subset S;
S8.2, calculating the importance of the input features, such as a summary graph and a sketch graph, and calculating the average contribution degree of each input feature to life prediction based on the global interpretation of SHAP values;
s8.3, local decision analysis is carried out, and the prediction result is comprehensively analyzed through analyzing the comprehensive influence of a plurality of input features, such as a dependency graph.
It should be noted that the above-mentioned embodiments are merely preferred embodiments of the present invention, and the present invention is not limited thereto, but may be modified or substituted for some of the technical features thereof by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (7)

Translated fromChinese
1.一种栓钉低周疲劳寿命预测方法,其特征在于:包括如下步骤:1. A method for predicting the low-cycle fatigue life of a stud, comprising the following steps:S1、建立原始数据集,所述原始数据集包括若干个输入特征和栓钉低周疲劳寿命,其中若干个输入特征为自变量,栓钉低周疲劳寿命为输出量;S1. Establishing an original data set, wherein the original data set includes a plurality of input features and a stud low-cycle fatigue life, wherein the plurality of input features are independent variables and the stud low-cycle fatigue life is an output;S2、对原始数据集中若干个所述输入特征进行数据标准化处理和特征筛选,得到特征筛选后的原始数据集;S2. performing data standardization and feature screening on a plurality of input features in the original data set to obtain an original data set after feature screening;S3、将特征筛选后的原始数据集划分为训练集和测试集;S3, dividing the original dataset after feature screening into training set and test set;S4、使用所述训练集,基于线性回归模型、树模型集成模型、梯度提升模型、概率建模模型和核函数模型,构建多模型预测模型;S4. Using the training set, construct a multi-model prediction model based on a linear regression model, a tree model ensemble model, a gradient boosting model, a probabilistic modeling model, and a kernel function model;S5、使用所述测试集,采用五折交叉验证结合网格搜索法,寻找最佳参数,测试所述多模型预测模型的可靠性,直到预测精度和训练时间达到预设条件,得到最优模型集成;S5. Using the test set, a five-fold cross-validation combined with a grid search method is used to find the optimal parameters and test the reliability of the multi-model prediction model until the prediction accuracy and training time meet the preset conditions, thereby obtaining the optimal model ensemble;S6、动态调整所述最优模型集成中的各个模型权重,将各个模型权重进行融合,构建加权投票机制,得到集成模型;S6. Dynamically adjust the weights of each model in the optimal model ensemble, fuse the weights of each model, construct a weighted voting mechanism, and obtain an ensemble model;S7、使用所述集成模型对目标栓钉进行低周疲劳寿命预测;S7. Predicting the low-cycle fatigue life of the target stud using the integrated model;S8、引入SHAP解释工具,对集成模型的预测结果进行详细解释。S8. Introduce the SHAP explanation tool to provide detailed explanations of the prediction results of the integrated model.2.根据权利要求1所述的一种栓钉低周疲劳寿命预测方法,其特征在于:所述S1中,若干个所述输入特征中包括三类,分别为荷载参数、材料参数、几何参数;2. A method for predicting low-cycle fatigue life of a stud according to claim 1, characterized in that: in said S1, the plurality of said input features include three categories, namely load parameters, material parameters, and geometric parameters;所述荷载参数包括加载值P、应力幅Δτ、最大应力τmax、最小应力τmin、平均应力τmeanThe load parameters include loading value P, stress amplitude Δτ, maximum stress τmax , minimum stress τmin , and average stress τmean ;所述材料参数包括混凝土强度fc、栓钉抗拉强度fuThe material parameters include concrete strength fc and stud tensile strengthfu ;所述几何参数包括栓钉直径d。The geometric parameters include the stud diameter d.3.根据权利要求1所述的一种栓钉低周疲劳寿命预测方法,其特征在于:所述S2中,3. The method for predicting low-cycle fatigue life of a stud according to claim 1, wherein: in S2,对若干个所述输入特征进行数据标准化处理的方法为采用Z-score标准化消除量纲差异,其公式为:The method for performing data standardization on several of the input features is to use Z-score standardization to eliminate dimensional differences, and the formula is:其中,μ为原始数据的均值,σ为原始数据的标准值,xnorm为标准化后的数据,xnorm的均值为0,标准差为1;Among them, μ is the mean of the original data, σ is the standard value of the original data, xnorm is the standardized data, the mean of xnorm is 0, and the standard deviation is 1;对若干个所述输入特征进行特征筛选的方法为通过Spearman秩相关系数筛选特征变量,其公式为:The method for performing feature screening on several of the input features is to screen feature variables by using the Spearman rank correlation coefficient, and the formula is:其中,di为第i个样本在两个变量上的秩次差值,n是样本总数,ρ为相关系数的符号,表示两个变量之间秩相关性的强度和方向,其值范围从-1到1。Where di is the rank difference of the i-th sample on the two variables, n is the total number of samples, and ρ is the sign of the correlation coefficient, which indicates the strength and direction of the rank correlation between the two variables, and its value ranges from -1 to 1.4.根据权利要求2所述的一种栓钉低周疲劳寿命预测方法,其特征在于:所述S4中,构建多模型预测模型的具体步骤为:4. The method for predicting the low-cycle fatigue life of a stud according to claim 2, wherein the steps of constructing the multi-model prediction model in S4 are as follows:S4.1训练线性回归模型:S4.1 Training linear regression model:通过最小二乘法建立荷载参数与栓钉低周疲劳寿命的线性映射关系,线性回归模型的模型表达式为:The linear mapping relationship between load parameters and low-cycle fatigue life of studs is established by the least squares method. The model expression of the linear regression model is:Nf=β01Δτ+β2τmaxNf01 Δτ+β2 τmax其中,Nf为疲劳寿命,β0为截距项,β1、β2为应力幅Δτ与最大应力τmax的回归系数,ò为随机误差项;Where Nf is fatigue life, β0 is the intercept term, β1 and β2 are the regression coefficients of stress amplitude Δτ and maximum stress τmax , and ò is the random error term;通过βi系数量化若干个所述输入特征的线性贡献,为后续非线性模型提供基准参考,其解析解形式:The linear contribution of several input features is quantified by theβi coefficient, providing a benchmark reference for subsequent nonlinear models. Its analytical solution is:其中,X为设计矩阵,y为观测寿命向量,为最小二乘估计的系数向量;Where X is the design matrix, y is the observed life vector, is the coefficient vector of the least squares estimate;S4.2训练树模型集成模型:S4.2 Training tree model ensemble model:决策树回归模型:采用递归二分策略,以应力幅Δτ和栓钉抗拉强度fu为关键分裂特征,以其非线性关系构建单棵决策树,通过局部均值预测实现零阶非线性拟合,其分裂阈值优化问题的解析式为:Decision tree regression model: A recursive binary partitioning strategy is adopted, with stress amplitude Δτ and stud tensile strength fu as key splitting features. A single decision tree is constructed based on their nonlinear relationship, and zero-order nonlinear fitting is achieved through local mean prediction. The analytical expression of the splitting threshold optimization problem is:其中,j为当前分裂特征,s为分裂阈值,RL,RR为分裂后的左右子区域,为左右子区域样本寿命均值,yi为第i个样本的真实寿命;Among them, j is the current split feature, s is the split threshold, RL , RR are the left and right sub-regions after splitting, is the mean lifespan of samples in the left and right sub-regions, andyi is the true lifespan of the i-th sample;随机森林模型:集成多棵决策树,通过随机选择特征子空间和样本子集降低方差,增强泛化能力,其表达式为:Random forest model: integrates multiple decision trees, reduces variance and enhances generalization ability by randomly selecting feature subspaces and sample subsets. Its expression is:其中,为输入x的平均预测值,B为决策树数量,Tb为第b棵树的预测函数,x为输入特征向量;in, is the average predicted value of input x, B is the number of decision trees, Tb is the prediction function of the b-th tree, and x is the input feature vector;极端随机树模型:在特征分裂点与样本采样上引入额外随机性,提升模型多样性,其表达式为:Extreme Random Tree Model: Introduces additional randomness in feature splitting points and sample sampling to improve model diversity. Its expression is:ξj~U(min(xj,max(xj))ξj ~U(min(xj ,max(xj ))其中,ξj表示特征j的随机分裂阈值,u表示均匀分布;Where ξj represents the random splitting threshold of feature j, and u represents uniform distribution;S4.3训练梯度提升模型:S4.3 Training the gradient boosting model:梯度提升树模型:通过迭代训练多个弱学习器,逐步优化预测结果,每一步的模型拟合前一步的残差,通过损失函数梯度方向进行残差计算,修正预测偏差,最终将所有弱模型加权组合成强预测模型,其公式分别为:Gradient boosting tree model: By iteratively training multiple weak learners, the prediction results are gradually optimized. The model at each step fits the residual of the previous step, and the residual is calculated through the gradient direction of the loss function to correct the prediction deviation. Finally, all weak models are weighted and combined into a strong prediction model. The formulas are:损失函数:Loss function:其中,L是损失函数,衡量模型预测值与真实值之间的差异;Y是样本的真实标签;F(x)是模型对输入x的预测值;x是输入特征向量;Where L is the loss function, which measures the difference between the model's predicted value and the true value; Y is the true label of the sample; F(x) is the model's predicted value for input x; x is the input feature vector;残差计算(第t步):Residual calculation (step t):其中,ri,t是第i个样本在第t步的残差;是损失函数关于模型的偏导数,表示损失函数随预测值的变化率;Ft-1(xi)是前一步(第t-1步)的模型对样本xi的预测值;xi是第i个样本的输入特征向量;Where ri,t is the residual of the i-th sample at the t-th step; is the partial derivative of the loss function with respect to the model, which represents the rate of change of the loss function with the predicted value; Ft-1 (xi ) is the predicted value of the model for sample xi in the previous step (t-1 step);xi is the input feature vector of the i-th sample;模型更新(学习率η):Ft(x)=Ft-1(x)+η·ht(x)Model update (learning rate η):Ft (x) = Ft-1 (x) + η·ht (x)其中,Ft(x)是第t步的模型对输入x的预测值;η是学习率,控制每一步模型更新的步长;ht(x)是第t棵决策树对输入x的预测值;WhereFt (x) is the prediction value of the model at step t for input x; η is the learning rate, which controls the step size of each model update;ht (x) is the prediction value of the tth decision tree for input x;极端梯度提升树模型:引入正则化(L1/L2)、二阶泰勒展开防止过拟合,并支持并行计算和缺失值处理,其公式为:Extreme Gradient Boosting Tree Model: Introduces regularization (L1/L2) and second-order Taylor expansion to prevent overfitting, and supports parallel computing and missing value processing. Its formula is:其中,L是模型的总体目标函数;l(yi,F(xi))是第i个样本的损失函数值,衡量模型F(xi)对真实值yi的预测误差;yi是第i个样本的真实标签或目标值;F(xi)是模型对第i个样本xi的预测值;xi是第i个样本的输入特征向量;Ω(fk)是第k棵树的正则化项,用于控制模型的复杂度;(fk)是第K棵决策树;Where L is the overall objective function of the model; l(yi , F(xi )) is the loss function value of the i-th sample, which measures the prediction error of the model F(xi ) for the true value yi ; yi is the true label or target value of the i-th sample; F(xi ) is the model's prediction value for the i-th samplexi ;xi is the input feature vector of the i-th sample; Ω(fk ) is the regularization term of the k-th tree, which is used to control the complexity of the model; (fk ) is the K-th decision tree;S4.4训练概率建模模型:S4.4 Training Probabilistic Modeling Model:高斯过程回归模型:构建核函数映射非线性关系,提供预测不确定性评估,其公式为:Gaussian process regression model: Constructs a kernel function to map nonlinear relationships and provide prediction uncertainty assessment. The formula is:其中,k(xi,xj)是输入xi和xj之间的核函数值,衡量他们之间的相似性;σ2是信号方差,控制函数的幅度;l是长度尺度,控制函数变化的平滑程度;Where k(xi ,xj ) is the kernel function value between inputsxi andxj , which measures the similarity between them;σ2 is the signal variance, which controls the amplitude of the function; l is the length scale, which controls the smoothness of the function change;贝叶斯回归模型:基于贝叶斯定理量化模型参数的后验分布,输出栓钉低周疲劳寿命预测的置信区间,其公式为:Bayesian regression model: Based on the Bayesian theorem, the posterior distribution of the model parameters is quantified and the confidence interval of the low-cycle fatigue life prediction of the stud is output. The formula is:后验分布,假设高斯先验β~N(0,Σ)Posterior distribution, assuming Gaussian prior β ~ N(0,Σ)其中,P(β|y,X)是指在给定数据y和X的条件下,模型参数β的后验概率分布;β是模型的参数向量;y是观测到的目标向量;X是输入特征矩阵;P(y|X,β)是指在给定特征向量X和参数β的条件下,观测到目标值y的概率;P(β)是参数β的先验概率分布;P(y|X)是指在给定特征X的条件下,观测到目标值y的概率;Where P(β|y, X) refers to the posterior probability distribution of the model parameter β given the data y and X; β is the parameter vector of the model; y is the observed target vector; X is the input feature matrix; P(y|X, β) refers to the probability of observing the target value y given the feature vector X and the parameter β; P(β) is the prior probability distribution of the parameter β; P(y|X) refers to the probability of observing the target value y given the feature vector X;N(0,∑)是均值为0,协方差矩阵为∑的高斯分布;∑是参数β的先验协方差矩阵;N(0,∑) is a Gaussian distribution with mean 0 and covariance matrix ∑; ∑ is the prior covariance matrix of parameter β;是均值为协方差矩阵为A-1的高斯分布;是后验均值,通常是参数的最大后验估计;A-1是后验协方差矩阵的逆; The mean is The covariance matrix is a Gaussian distribution with A-1 ; is the posterior mean, usually the maximum a posteriori estimate of the parameters; A-1 is the inverse of the posterior covariance matrix;S4.5训练核函数模型:S4.5 training kernel function model:通过核函数映射数据到高维空间,在ε-敏感带内拟合数据,其公式为:The data is mapped to a high-dimensional space through the kernel function, and the data is fitted within the ε-sensitive band. The formula is:问题优化:Problem optimization:其中,w是模型权重向量;C是正则化参数,控制对训练误差的惩罚程度;ξi是松弛变量,分别表示样本i在上、下边界的训练误差;Among them, w is the model weight vector; C is the regularization parameter, which controls the degree of penalty for training error; ξi , are slack variables, representing the training errors of sample i at the upper and lower boundaries respectively;约束条件:Constraints:其中,yi是第i个样本的真实目标值;是将输入xi映射到高维特征空间的函数;b是模型的偏置项;∈是不敏感带的宽度,表示在∈范围内的误差不计入损失;Among them,yi is the true target value of the i-th sample; is a function that maps the inputxi to a high-dimensional feature space; b is the bias term of the model; ∈ is the width of the insensitive band, indicating that the error within the range of ∈ is not included in the loss;预测方程:Prediction equation:其中,f(x)是模型的预测值;αi是拉格朗日乘子,对应于样本i的上下约束边界;K(xi,x)为核函数。Among them, f(x) is the predicted value of the model; αi , is the Lagrange multiplier, corresponding to the upper and lower constraint boundaries of sample i; K(xi , x) is the kernel function.5.根据权利要求1所述的一种栓钉低周疲劳寿命预测方法,其特征在于:所述S5的具体步骤为:5. The method for predicting low-cycle fatigue life of a stud according to claim 1, wherein the specific steps of S5 are:使用所述测试集,通过网格搜索,定义超参数空间生成所有参数组合,使用五折交叉验证(K=5)训练并评估各模型,寻找最佳参数,测试所述多模型预测模型的可靠性,直到预测精度和训练时间达到预设条件,得到最优模型集成;Using the test set, a hyperparameter space is defined to generate all parameter combinations through grid search, and each model is trained and evaluated using five-fold cross validation (K=5) to find the optimal parameters and test the reliability of the multi-model prediction model until the prediction accuracy and training time meet the preset conditions, thereby obtaining the optimal model ensemble;假设超参数集合:Assume the hyperparameter set:......则搜索空间是所有超参数组合的笛卡尔积:Then the search space is the Cartesian product of all hyperparameter combinations:Θ=Θ1×Θ2×…×ΘkΘ=Θ1 ×Θ2 ×…×Θk选择最优参数θ*,使得模型评估指标F(θ)最大化:Select the optimal parameter θ* to maximize the model evaluation index F(θ):θ*=argmaxF(θ)(θ∈Θ)θ* = argmaxF(θ)(θ∈Θ)其中,Θ1、Θ2、Θk分别表示第一个、第二个、第K个超参数的取值集合;Θ是超参数的搜索空间,包含所有可能的超参数组合;θ*是最优超参数组合;F(θ)是模型在超参数组合θ下的评估指标;Where Θ1 , Θ2 , and Θk represent the value sets of the first, second, and Kth hyperparameters, respectively; Θ is the hyperparameter search space, which contains all possible hyperparameter combinations; θ* is the optimal hyperparameter combination; F(θ) is the evaluation index of the model under the hyperparameter combination θ;以均方根误差RMSE、平均绝对百分比误差MAPE、决定系数R2为评价指标,对所述最优模型集成进行评估,其公式为:The root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R2) are used as evaluation indicators to evaluate the optimal model ensemble, and the formula is as follows:其中,yi是实际值,是预测值,n是样本量,是实际数值的平均值。Among them,yi is the actual value, is the predicted value, n is the sample size, is the average of the actual values.6.根据权利要求1所述的一种栓钉低周疲劳寿命预测方法,其特征在于:所述S6的具体步骤为:6. The method for predicting low-cycle fatigue life of a stud according to claim 1, wherein the specific steps of S6 are:S6.1通过随机森林模型生成多棵决策树,提取特征重要性权重,其公式为:S6.1 generates multiple decision trees through the random forest model and extracts the feature importance weights. The formula is:其中,Sf节点中分裂特征为f的集合,ΔImpurity(s)为节点S分裂后的不纯度下降量;Where, the split feature in the Sf node is the set of f, and ΔImpurity(s) is the decrease in impurity after the node S is split;随后归一化特征重要性,将特征重要性权重wf归一化为[0,1],其公式为:Then normalize the feature importance and normalize the feature importance weightwf to [0,1], the formula is:其中,Importance(f)是特征f的重要性;T是决策树的总数;Sf节点中分裂特征为f的集合,ΔImpurity(s)为节点S分裂后的不纯度下降量;Where Importance(f) is the importance of feature f; T is the total number of decision trees; S is the set of split features f in thef node, and ΔImpurity(s) is the decrease in impurity after the node S is split;wf是特征f归一化后的重要性权重;F为总特征数;Importance(j)表示第j个特征未归一化的重要性;wf is the normalized importance weight of feature f; F is the total number of features; Importance(j) represents the unnormalized importance of the jth feature;S6.2基于极端梯度提升树模型对输入特征进行梯度提升,动态调整样本权重;S6.2 performs gradient boosting on input features based on the extreme gradient boosting tree model and dynamically adjusts sample weights;首先进行样本权重计算,样本i的权重Si由其所有特征的加权和决定,其公式为:First, the sample weight is calculated. The weightSi of sample i is determined by the weighted sum of all its features. The formula is:其中,Si是第i个样本的权重;wf是特征f的归一化权重,来自S6.1的计算结果;uf为特征f的均值,|xi,ff|为衡量样本i在特征f上的偏离程度;Where Si is the weight of the i-th sample; wf is the normalized weight of feature f, calculated from S6.1; uf is the mean of feature f, and |xi,f -μf | is a measure of the deviation of sample i from feature f.其次,在极端梯度提升树模型的目标函数中引入样本权重Si,其公式为:Secondly, the sample weightSi is introduced into the objective function of the extreme gradient boosting tree model, and its formula is:其中,L是引入样本权重后的模型总体目标函数;Among them, L is the overall objective function of the model after introducing sample weights;S6.3将随机森林模型的特征重要性权重与极端梯度提升树模型的样本权重进行融合,构建加权投票机制,生成最终预测结果,得到集成模型,其融合公式为:S6.3 fuses the feature importance weights of the random forest model with the sample weights of the extreme gradient boosting tree model to construct a weighted voting mechanism to generate the final prediction results and obtain an integrated model. The fusion formula is:其中,是随机森林模型对输入特征x的初始预测值,是由极端梯度提升树模型预测的残差,α是权重超参数,用于控制随机森林模型初始预测和极端梯度提升树模型残差修正的贡献比例。in, is the initial prediction value of the random forest model for the input feature x, is the residual predicted by the extreme gradient boosting tree model, and α is a weight hyperparameter used to control the contribution ratio of the initial prediction of the random forest model and the residual correction of the extreme gradient boosting tree model.7.根据权利要求1所述的一种栓钉低周疲劳寿命预测方法,其特征在于:所述S8的具体步骤为:7. The method for predicting low-cycle fatigue life of a stud according to claim 1, wherein the specific steps of S8 are:S8.1采用SHAP框架中的Kernel SHAP算法,通过加权线性回归近似Shapley值,计算每个输入特征对预测值的边际贡献,针对栓钉寿命预测模型,定义核函数为:S8.1 uses the Kernel SHAP algorithm in the SHAP framework to approximate the Shapley value through weighted linear regression and calculate the marginal contribution of each input feature to the predicted value. For the stud life prediction model, the kernel function is defined as:其中,N是所有输入特征的集合,S是不包含输入特征i的特征子集,是输入特征i的SHAP值,f(S)是模型在给定特征子集S时的预测输出;Among them, N is the set of all input features, S is the feature subset that does not contain input feature i, is the SHAP value of input feature i, and f(S) is the predicted output of the model given the feature subset S;S8.2输入特征重要性计算,基于SHAP值的全局解释,计算各输入特征对寿命预测的平均贡献度;S8.2 Input feature importance calculation, based on the global interpretation of SHAP value, calculate the average contribution of each input feature to life prediction;S8.3局部决策解析,通过分析多个输入特征的综合影响,对预测结果全面分析。S8.3 Local decision analysis: comprehensively analyze the prediction results by analyzing the combined impact of multiple input features.
CN202510838505.3A2025-06-232025-06-23Stud low cycle fatigue life prediction methodPendingCN120764333A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510838505.3ACN120764333A (en)2025-06-232025-06-23Stud low cycle fatigue life prediction method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510838505.3ACN120764333A (en)2025-06-232025-06-23Stud low cycle fatigue life prediction method

Publications (1)

Publication NumberPublication Date
CN120764333Atrue CN120764333A (en)2025-10-10

Family

ID=97238490

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510838505.3APendingCN120764333A (en)2025-06-232025-06-23Stud low cycle fatigue life prediction method

Country Status (1)

CountryLink
CN (1)CN120764333A (en)

Similar Documents

PublicationPublication DateTitle
CN114547974B (en)Dynamic soft measurement modeling method based on input variable selection and LSTM neural network
CN107463993B (en)Medium-and-long-term runoff forecasting method based on mutual information-kernel principal component analysis-Elman network
CN113011796A (en)Edible oil safety early warning method based on hierarchical analysis-neural network
Lagaros et al.Multi-objective design optimization using cascade evolutionary computations
US20230419086A1 (en)System and method for processing material properties of structural materials
CN116502455A (en)Process parameter determination method and system for laser selective melting technology
Liao et al.A physics-informed neural network method for identifying parameters and predicting remaining life of fatigue crack growth
Giannella et al.Neural networks for fatigue crack propagation predictions in real-time under uncertainty
Galanopoulos et al.A novel strain-based health indicator for the remaining useful life estimation of degrading composite structures
CN118151020B (en)Method and system for detecting safety performance of battery
CN115389743B (en) A method, medium and system for predicting interval of dissolved gas content in transformer oil
Tian et al.Novel optimal sensor placement method towards the high-precision digital twin for complex curved structures
CN111160715A (en)BP neural network based new and old kinetic energy conversion performance evaluation method and device
Zhou et al.Active learning-based structural reliability evaluation Kriging model and sequential importance sampling
Huang et al.Data-driven prediction of high-temperature bond strength in corroded reinforced concrete
CN119293652A (en) A method and system for analyzing displacement effects of cable-stayed bridge structures
CN118966451A (en) A method for ultra-short-term power prediction of distributed photovoltaic clusters
CN120764333A (en)Stud low cycle fatigue life prediction method
CN118313203A (en) A structural fatigue reliability assessment method based on WOA-XGBoost proxy model
Li et al.Ultra-high cycle fatigue life prediction of titanium alloy with small sample size based on the PSO-BP model
Román et al.Forecast constraints on null tests of the $\Lambda $ CDM model with SPHEREx
Tang et al.Structural reliability assessment under creep-fatigue considering multiple uncertainty sources based on surrogate modeling approach
Emeke et al.A novel model developed for forecasting oilfield production using multivariate linear regression method
CN120430526B (en)PINN-based water ecology space-time variation driving mechanism analysis method and system
Gata et al.The Feasibility of Credit Using C4. 5 Algorithm Based on Particle Swarm Optimization Prediction

Legal Events

DateCodeTitleDescription
PB01Publication

[8]ページ先頭

©2009-2025 Movatter.jp