Movatterモバイル変換


[0]ホーム

URL:


CN109326329A - An ensemble learning-based method for predicting the action sites of zinc-binding proteins in a non-equilibrium model - Google Patents

An ensemble learning-based method for predicting the action sites of zinc-binding proteins in a non-equilibrium model
Download PDF

Info

Publication number
CN109326329A
CN109326329ACN201811353819.0ACN201811353819ACN109326329ACN 109326329 ACN109326329 ACN 109326329ACN 201811353819 ACN201811353819 ACN 201811353819ACN 109326329 ACN109326329 ACN 109326329A
Authority
CN
China
Prior art keywords
zinc
sample
binding protein
prediction
action site
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811353819.0A
Other languages
Chinese (zh)
Other versions
CN109326329B (en
Inventor
李慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinling Institute of Technology
Original Assignee
Jinling Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinling Institute of TechnologyfiledCriticalJinling Institute of Technology
Priority to CN201811353819.0ApriorityCriticalpatent/CN109326329B/en
Publication of CN109326329ApublicationCriticalpatent/CN109326329A/en
Application grantedgrantedCritical
Publication of CN109326329BpublicationCriticalpatent/CN109326329B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Landscapes

Abstract

Translated fromChinese

本发明公开了一种非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,针对锌结合蛋白质作用位点的特点,对蛋白质源数据进行预处理;借助随机下采样技术对锌结合蛋白质作用位点的非平衡性进行平衡化处理,得到若干个子平衡数据集;分别在若干个子平衡数据集上,选取有可区分性的蛋白质生化特征,进行特征表示,组成特征向量;分别把特征向量作为基分类器支持向量机的输入,计算样本权重,再构建基于样本加权的概率神经网络模型,最后整合基分类模型支持向量机和基于样本加权的概率神经网络模型得到预测模型;采用得到预测模型对目标样品中的锌结合蛋白质作用位点进行识别。

The invention discloses a method for predicting the action site of zinc binding protein based on integrated learning in a non-equilibrium mode. According to the characteristics of the action site of zinc binding protein, protein source data is preprocessed; The non-equilibrium of the action site is balanced, and several sub-equilibrium data sets are obtained; respectively, on several sub-equilibrium data sets, distinguishable protein biochemical features are selected to represent the features and form feature vectors; As the input of the base classifier support vector machine, the sample weight is calculated, and then the probabilistic neural network model based on sample weighting is constructed. Finally, the base classification model support vector machine and the probabilistic neural network model based on sample weighting are integrated to obtain the prediction model; the prediction model is obtained by using Identify zinc-binding protein action sites in target samples.

Description

The zinc-binding protein matter action site based on integrated study is pre- under a kind of unbalanced modeSurvey method
Technical field
The present invention relates to the zinc-binding protein matter action site prediction technique under a kind of unbalanced mode based on integrated study,It is to identify zinc-binding protein matter action site under non-equilibrium classification mode using integrated study disaggregated model, belong to albumenThe crossing domain of matter group and computer science.
Background technique
With the completion of the Human Genome Project, life science enters genome times afterwards comprehensively, albumen expressed by geneThe matter research topic one of important as life science and natural science field.Protein (Protein) is the base for constituting cellThis organic matter is the material base of life, plays decisive role during biological life.However, this decisive roleNot being not that can simply be determined by single protein, in most situations, need by protein and other protein orLigand interacts to complete specific biological function jointly.
In cell, agent and the undertaker of the protein as vital movement complete spy by interacting with ligandFixed key effect, such as DNA synthesis, signal transduction, gene transcriptional activation, metabolic process of life, antivirus protection etc..ItsSecondary, the treatment aspect that protein acts on various diseases also has great progradation, and especially some virus proteins are invadedIt disturbs, such as Ebola virus (Ebola virus), it can disclose the pathogenesis of certain diseases, find the target spot of some drugsThere is directive function with new drug development.
Metal ion in conjunction with protein, plays its biological function even some life mistakes to protein as co-factorJourney plays conclusive effect.Zinc ion is only second to iron as the in organism second metal ion abundant, the life to organismLong development, disease control, DNA synthesis etc. have important regulating and controlling effect.Zinc ion shortage will lead to some diseases, such as age phaseThe retired property disease closed, malignant tumour and Wilson disease.In addition, zinc also has aging, apoptosis, immune function and oxidative stressImportant function.Zinc ion just exercises the biological functions such as catalysis, rock-steady structure and coordination in conjunction with protein.
To the identification of zinc-binding protein matter action site mainly using biochemical test method.Though these experimental method energyThe interaction sites between protein and zinc ion are measured, but since measuring cost is too high, it is time-consuming and laborious;Moreover, becauseExperiment needs different restrictive condition, using different experimental principles, can make in this way experimental result have certain false negative andFalse positive.Therefore, find that the biological significance of these data has been far from satisfying life by experimental technique and means merelyThe needs of object development.
With the development of information technology and the appearance of magnanimity biological data, some calculation methods such as data mining technology is utilizedAnd machine learning related algorithm automatic identification zinc-binding protein matter action site is a kind of inexorable trend of development.It has costIt low, the advantages that speed is fast, can overcome the disadvantages that the defect of experiment, and further provided for Bioexperiment measurement interaction of a high priceDirectly supports and lead.
The prediction of zinc ion conjugated protein action site is two classification problems, and the action site really combined is seldom,Uncombined action site accounting is very high, and the prediction of zinc-binding protein matter action site is a typical non-equilibrium classification problem.Current existing prediction technique establishes disaggregated model using the methods of data mining, and two class samples are put on an equal footing, are not accounted forTo the disequilibrium of data, the precision for causing zinc-binding protein matter action site to predict is very low.Therefore, zinc-binding protein matter is studiedNon-equilibrium property in action site prediction, the classification accuracy for improving minority class have important research significance.
Summary of the invention
The purpose of the present invention is provide one for the non-equilibrium property classification problem in the prediction of zinc-binding protein matter action siteZinc-binding protein matter action site prediction technique based on integrated study under kind unbalanced mode.
In order to solve the above-mentioned technical problem, the technical solution adopted by the present invention is as follows:
Zinc-binding protein matter action site prediction technique based on integrated study under a kind of unbalanced mode, including walk as followsIt is rapid:
Step 1: the characteristics of being directed to zinc-binding protein matter action site pre-processes protein source data;
Step 2: by random down-sampling technology to non-equilibrium being balanced of the property place of zinc-binding protein matter action siteReason, obtains several quantum balancing data sets;
Step 3: respectively on several quantum balancing data sets, the protein biochemistry feature for having ga s safety degree is chosen, is carried outCharacter representation, composition characteristic vector;
Step 4: respectively using feature vector as the input of base classifier support vector machines, sample weights are calculated, then are constructedProbabilistic neural network model based on sample weighting finally integrates base disaggregated model support vector machines and based on the general of sample weightingRate neural network model obtains prediction model;
Step 5: prediction model is obtained using step 4, the zinc-binding protein matter action site in target sample is knownNot.
Wherein, in step 1, the pretreatment removes following noise data:
(1) removal homology is higher than 70% peptide chain structure;
(2) duplicate, shorter protein chain and mistake and insecure data are rejected;
(3) removal meets chain of the sequence redundancy less than 20%.
In step 2, the equilibrating processing is that random down-sampling technology is that random lower sampling is carried out to major class sample, oftenIt is secondary to extract quantity identical with group sample, constitute several quantum balancing data sets;The major class sample is uncombined proteinAction site, the group sample are the protein interaction sites that zinc combines.
In step 3, the biochemical character of the ga s safety degree includes feature locations specificity score matrix, conservative scoreWith RW-GRMTP (relative weight of gapless real matches to pseudocounts gapless realRelative weighting with pseudorange);Position-specific scoring matrices are normalized, and are used at histogram and sliding windowReason obtains the vector of one 20 dimension;The conservative score of 20 dimensions is converted into a value;Place is normalized to RW-GRMTPReason, obtains 2 dimensional vectors;Ultimately form the feature vector of one 23 dimension.
In step 4, base classifier SVM support vector machines is respectively trained on several quantum balancing data sets, according to formula(1) and formula (2) calculates separately prediction error rate ejWith the important procedure weight α of disaggregated modelj
Wherein, all data sets are D, D={ (x1,y1),(x2,y2),…,(xn,yn), xi∈ X, X represent classification problemClass field instance space, yi∈ { 1, -1 }, i=1,2 ... n, n are sample numbers;wmiFor weight, initial value is set as 1/n, i.e. w1=(w11,w12,...,w1n), wherein w1i=1/n;I=1,2 ..., n;M=1,2;Respectively using base point on k equilibrium data collectionClass device SVM is trained, and obtains k classification prediction result Csvm_j(x), j=1 ..., k.
It calculates current sample weights and is normalized, sample classification is correct, reduces corresponding sample weights;If sampleThis classification error increases corresponding sample weights, calculation formula such as formula (3):
Probabilistic neural network model of the building based on sample weighting is to be weighted to protein characteristic data, after weightingInput of the sample data as probabilistic neural network model is predicted, this method is denoted as SWPNN using probabilistic neural network,Prediction result is SWPNN (x).
It integrates base disaggregated model support vector machines and the probabilistic neural network model based on sample weighting obtains prediction modelSSWPNN, SSWPNN={ SVM, SWPNN, kernelopt, spread, f }, wherein kernelopt, spread be respectively SVM andThe parameter of SWPNN classifier, shown in the definition of f such as formula (4);Corresponding weight beta is calculated according to error rate simultaneouslyj
Wherein, δ is threshold value, Csvm_j(x) and SWPNN (x) be respectively classifier SVM and SWPNN classification results, value is bigIn 0, then the class sample that is positive is predicted, the class sample that is negative is predicted less than 0.If the value of SVM (X) is positive and is less than threshold value δ, andWhen SWPNN (X) is predicted as counter-example, finally integrated prediction result is judged as counter-example, is final with SVM (X) result in the case of otherThe result of judgement.
In step 5, it is utilized respectively integrated model SSWPNN in entire test data set and is predicted, obtains differentClassification results, then result is weighted integrated, zinc-binding protein matter action site in target sample is finally identified, such as formula(5) shown in:
The utility model has the advantages that
The method that the present invention is mentioned is acted on from the angle of machine learning for zinc-binding protein matter under unbalanced modeThe identification problem in site proposes a kind of novel zinc-binding protein matter action site prediction technique based on integrated study, hasEffect solves the prediction of zinc-binding protein matter action site under non-equilibrium classification mode, achieves certain predictablity rate.ThisInvention can be applied to the Forecasting recognition of other type of metal ion conjugated protein action sites after extension.
Detailed description of the invention
The present invention is done with reference to the accompanying drawings and detailed description and is further illustrated, of the invention is above-mentionedAnd/or otherwise advantage will become apparent.
Fig. 1 is the overall framework figure of the method for the present invention.
Fig. 2 is the zinc-binding protein matter action site classifier frame diagram based on SVM and SWPNN model.
Fig. 3 is the prediction procedure chart of SSWPNN classifier.
Specific embodiment
According to following embodiments, the present invention may be better understood.
Overall procedure of the invention is as shown in Figure 1.
The present invention is directed to the zinc-binding protein matter action site forecasting problem under unbalanced dataset, using to down-sampling skillArt makes data tend towards stability being balanced of data.Using integrated technology building based on support vector machines and sample weightingProbabilistic neural network sorter model, and Classification and Identification is carried out to zinc-binding protein matter action site using model.Specific implementationSteps are as follows:
1. equilibrating is handled
The protein interaction sites that zinc combines are called group sample (negative class sample);Uncombined protein interaction sitesReferred to as major class sample (positive class sample).Nothing at random is carried out to major class sample and puts back to lower sampling, while in order to avoid random down-samplingIt is likely to result in the loss of major class sample useful information, takes the upper multiple sampling without replacement of data complete or collected works.Major class sample is carried outRandom nothing puts back to lower sampling, extracts quantity identical with group sample every time, i.e., major class sample is divided into k subset, every heightCollection and group sample synthesize equilibrium data collection D1,D2,…,Dk.The description of its process available algorithm 1:
Algorithm 1: data balancing Processing Algorithm
Input: protein sequence sample data D
Output: quantum balancing data set D1,D2,…,Dk
1 BEGIN;
2 Divide(D);
3 N=CountUp (MinoritySample);
4 For (i=1;I≤k;i++);
5 ExtractedSamplei=RandomExtract (MajoritySample, N);
6 Di=Merge (MinoritySample, ExtractedSamplei);
7 MajoritySample=MajoritySample-ExtractedSamplei
8 End for;
9 END。
2. attributive character indicates
Choose the biochemical character for having ga s safety degree: position-specific scoring matrices, conservative score and RW-GRMTP(relative weight of gapless real matches to pseudocounts), carries out character representation, and composition is specialLevy vector set.Position-specific scoring matrices are normalized, and are handled using histogram and sliding window, obtain oneThe vector of a 20 dimension;The conservative score of 20 dimensions is converted into a value;RW-GRMTP is normalized, obtains one2 dimensional vectors;Ultimately form the feature vector of one 23 dimension.
3. the probabilistic neural network model of integrated supporting vector machine and sample weighting
It is trained using base classifier support vector machines, according to classification results, sample is weighted, be in someBoundary is easy " difficulty divides sample " of misclassification, probabilistic neural network model of the training based on weighting.
If all data sets are D, D={ (x1,y1),(x2,y2),…,(xn,yn), xi∈ X, X represent the class of classification problemDomain instance space, yi∈ { 1, -1 }, i=1,2 ... n, n are sample numbers.Process are as follows:
Step 1: SVM classifier is respectively trained on several quantum balancing data sets;
It is trained respectively using base classifier SVM on k sub- equilibrium data collection, cross validation is folded using 5-, is obtainedTo k classification prediction result Csvm_j(x), j=1 ..., k.The error rate of prediction is denoted as ej, the significance level weight of disaggregated modelFor αj, calculate such as formula (1) and (2).In formula (1), wmiFor weight, initial value is set as 1/n, i.e. w1=(w11,w12,...,w1n),Middle w1i=1/n;I=1,2 ..., n;M=1,2.
Step 2: current sample weights are calculated and are normalized;
After first round base classifier SVM prediction, if some sample classification is correct, in next round prediction, dropIts low weight;On the contrary, in next round prediction, improving his weight if some sample classification mistake.Sample weights functionCalculate such as formula (3):
Step 3: PNN fallout predictor SWPNN of the training based on sample weighting;
Feature samples data are weighted using calculated weight in Step 2, probabilistic neural of the training based on weightingNetwork model, the method for proposition are denoted as SWPNN, and prediction result is SWPNN (x).Zinc based on SVM and SWPNN model, which combines, to be madeIt is as shown in Figure 2 with site classifier frame.
Step 4: the SWPNN classifier of base disaggregated model SVM and sample weighting is integrated;
The probabilistic neural network model of integrated base classifier SVM and sample weighting propose a kind of new prediction techniqueSSWPNN, i.e. SSWPNN={ SVM, SWPNN, kernelopt, spread, f }, wherein kernelopt, spread are SVM respectivelyWith the parameter of SWPNN classifier, shown in the definition of f such as formula (4).Corresponding weight beta is calculated according to error rate simultaneouslyj(this is basicClassifier is in the weight in final classification device).
Wherein δ is threshold value, Csvm_j(x) and SWPNN (x) be respectively classifier SVM and SWPNN classification results, value is bigIn 0, then the class sample that is positive is predicted, the class sample that is negative is predicted less than 0.If the value of SVM (X) is positive and smaller, it is less than threshold valueδ, and when SWPNN (X) is predicted as counter-example, finally integrated prediction result is judged as counter-example, in the case of other, is with SVM (X) resultThe result finally judged.
Step 5: the integrated model SSWPNN being utilized respectively in Step 4 on entire data set is predicted, is obtained notWith classification results, then be weighted integrated using formula (5) to result, finally identify zinc-binding protein matter action site.FrameFrame model is as shown in Figure 3.
Tested on the data set of 392 protein chains, and with existing there are four types of method (meta-ZincPrediction, ZincExplorer, zincFinder, zincPred) performance comparison is carried out, whether to four kinds of residues(CHED) estimated performance of whole estimated performance or any residue, method of the invention are better than other methods.
The present invention provides the zinc-binding protein matter action site prediction sides under a kind of unbalanced mode based on integrated studyThe thinking and method of method, there are many method and the approach for implementing the technical solution, and the above is only preferred reality of the inventionApply mode, it is noted that for those skilled in the art, without departing from the principle of the present invention,Several improvements and modifications can also be made, these modifications and embellishments should also be considered as the scope of protection of the present invention.In the present embodiment notThe available prior art of specific each component part is realized.

Claims (9)

Translated fromChinese
1.一种非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,包括如下步骤:1. a zinc-binding protein action site prediction method based on ensemble learning under a non-equilibrium mode, is characterized in that, comprises the steps:步骤一:针对锌结合蛋白质作用位点的特点,对蛋白质源数据进行预处理;Step 1: According to the characteristics of the zinc-binding protein action site, preprocess the protein source data;步骤二:借助随机下采样技术对锌结合蛋白质作用位点的非平衡性进行平衡化处理,得到若干个子平衡数据集;Step 2: Use random downsampling technology to balance the non-equilibrium of the zinc-binding protein action site to obtain several sub-equilibrium data sets;步骤三:分别在若干个子平衡数据集上,选取有可区分性的蛋白质生化特征,进行特征表示,组成特征向量;Step 3: Select distinguishable protein biochemical features on several sub-balanced datasets, perform feature representation, and form feature vectors;步骤四:分别把特征向量作为基分类器支持向量机的输入,计算样本权重,再构建基于样本加权的概率神经网络模型,最后整合基分类模型支持向量机和基于样本加权的概率神经网络模型得到预测模型;Step 4: Take the feature vector as the input of the support vector machine of the base classifier, calculate the weight of the sample, and then build a probabilistic neural network model based on sample weighting, and finally integrate the support vector machine of the base classification model and the probabilistic neural network model based on sample weighting to obtain prediction model;步骤五:采用步骤四得到预测模型对目标样品中的锌结合蛋白质作用位点进行识别。Step 5: Use the prediction model obtained in Step 4 to identify the action site of the zinc-binding protein in the target sample.2.根据权利要求1所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤一中,所述预处理去除如下噪声数据:2. The method for predicting the action site of zinc-binding protein based on ensemble learning in the non-equilibrium mode according to claim 1, wherein in step 1, the preprocessing removes the following noise data:(1)去除同源性高于70%的肽链结构;(1) Remove peptide chain structures with homology higher than 70%;(2)剔除重复的,较短的蛋白质链以及错误和不可靠的数据;(2) Eliminate repetitive, shorter protein chains and erroneous and unreliable data;(3)去除满足序列冗余小于20%的链。(3) Remove chains that satisfy sequence redundancy less than 20%.3.根据权利要求1所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤二中,所述平衡化处理为随机下采样技术为对大类样本进行随机下抽样,每次抽取与小类样本相同的数量,构成若干个子平衡数据集;所述大类样本为非结合的蛋白质作用位点,所述小类样本为锌结合的蛋白质作用位点。3. The zinc-binding protein action site prediction method based on ensemble learning under the non-equilibrium mode according to claim 1, characterized in that, in step 2, the equilibration process is a random downsampling technique, which is to perform a large-scale sample analysis. Random down-sampling is performed, and the same number of small-class samples are taken each time to form several sub-equilibrium data sets; the large-class samples are non-binding protein action sites, and the sub-class samples are zinc-binding protein action sites.4.根据权利要求1所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤三中,所述可区分性的生化特征包括特征位置特异性得分矩阵、保守性得分和RW-GRMTP;对位置特异性得分矩阵进行归一化处理,并采用直方图和滑动窗口处理,得到一个20维的向量;把20维的保守性得分转换成一个值;对RW-GRMTP进行归一化处理,得到一个2维向量;最终形成一个23维的特征向量。4. The method for predicting the action site of zinc-binding protein based on ensemble learning in the non-equilibrium mode according to claim 1, wherein in step 3, the distinguishable biochemical features include a feature position-specific score matrix, Conservation score and RW-GRMTP; normalize the position-specific score matrix, and use histogram and sliding window processing to obtain a 20-dimensional vector; convert the 20-dimensional conservative score into a value; for RW -GRMTP is normalized to obtain a 2-dimensional vector; finally a 23-dimensional feature vector is formed.5.根据权利要求1所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤四中,在若干子平衡数据集上分别训练基分类器SVM支持向量机,根据式(1)和式(2)分别计算预测误差率ej和分类模型的重要程序权重αj5. The zinc-binding protein action site prediction method based on ensemble learning under the unbalanced mode according to claim 1, wherein in step 4, the base classifier SVM support vector machine is respectively trained on several sub-balanced data sets , calculate the prediction error rate ej and the important program weight αj of the classification model according to formula (1) and formula (2) respectively;其中,全体数据集为D,D={(x1,y1),(x2,y2),…,(xn,yn)},xiεX,X代表分类问题的类域实例空间,yiε{1,-1},i=1,2,…n,n是样本数;wmi为权重,初始值设为1/n,即w1=(w11,w12,...,w1n),其中w1i=1/n;i=1,2,…,n;m=1,2;在k个子平衡数据集上分别使用基分类器SVM进行训练,得到k个分类预测结果Csvm_j(x),j=1,…,k。Among them, the overall data set is D, D={(x1 , y1 ), (x2 , y2 ),...,(xn , yn )}, xi εX, X represent the class domain instance of the classification problem space, yi ε{1, -1}, i=1, 2,...n, n is the number of samples; wmi is the weight, and the initial value is set to 1/n, that is, w1 =(w11 ,w12 , ...,w1n ), where w1i = 1/n; i = 1, 2, ..., n; m = 1, 2; using the base classifier SVM for training on k sub-balanced datasets, respectively, to obtain k classification prediction results Csvm_j (x), j=1,...,k.6.根据权利要求5所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤四中,计算当前样本权重并进行归一化处理,样本分类正确,减少相应的样本权值;若样本分类错误,增加相应的样本权值,计算公式如式(3):6. The method for predicting the action site of zinc-binding protein based on ensemble learning in the non-equilibrium mode according to claim 5, wherein in step 4, the weight of the current sample is calculated and normalized, the sample is correctly classified, and the reduction The corresponding sample weight; if the sample classification is wrong, increase the corresponding sample weight, the calculation formula is as formula (3):7.根据权利要求6所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤四中,构建基于样本加权的概率神经网络模型为对蛋白质特征数据进行加权,加权后的样本数据作为概率神经网络模型的输入,使用概率神经网络进行预测,该方法记作SWPNN,预测结果为SWPNN(x)。7. The method for predicting the action site of zinc-binding protein based on ensemble learning under the non-equilibrium mode according to claim 6, wherein in step 4, constructing a probability neural network model based on sample weighting is to weight the protein feature data , the weighted sample data is used as the input of the probabilistic neural network model, and the probabilistic neural network is used for prediction. This method is denoted as SWPNN, and the prediction result is SWPNN(x).8.根据权利要求6所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤四中,整合基分类模型支持向量机和基于样本加权的概率神经网络模型得到预测模型SSWPNN,SSWPNN={SVM,SWPNN,kernelopt,spread,f},其中kernelopt,spread分别是SVM和SWPNN分类器的参数,f的定义如式(4)所示;同时根据错误率计算相应的权重βj8. The zinc-binding protein action site prediction method based on ensemble learning under non-equilibrium mode according to claim 6, wherein in step 4, the base classification model support vector machine and the sample weighting-based probabilistic neural network model are integrated Obtain the prediction model SSWPNN, SSWPNN={SVM, SWPNN, kernelopt, spread, f}, where kernelopt, spread are the parameters of the SVM and SWPNN classifiers, respectively, and the definition of f is shown in formula (4); at the same time, the corresponding error rate is calculated. the weight βj ;其中,δ为阈值,Csvm_j(x)和SWPNN(x)分别是分类器SVM和SWPNN的分类结果,其值大于0,则预测为正类样本,小于0则预测为负类样本。若SVM(X)的值为正且小于阈值δ,且SWPNN(X)预测为反例时,最终集成预测结果判断为反例,其他情况下,以SVM(X)结果为最终判断的结果。Among them, δ is the threshold value, Csvm_j (x) and SWPNN (x) are the classification results of the classifier SVM and SWPNN, respectively. If the value is greater than 0, it is predicted as a positive class sample, and if it is less than 0, it is predicted as a negative class sample. If the value of SVM(X) is positive and less than the threshold δ, and the SWPNN(X) prediction is a negative example, the final integrated prediction result is judged as a negative example, and in other cases, the SVM(X) result is the final judgment result.9.根据权利要求8所述的非平衡模式下基于集成学习的锌结合蛋白质作用位点预测方法,其特征在于,步骤五中,在整个数据集上分别利用集成模型SSWPNN进行预测,得出不同的分类结果,再对结果进行加权集成,最终识别出目标样品中锌结合蛋白质作用位点,如式(5)所示:9. The zinc-binding protein action site prediction method based on ensemble learning under the non-equilibrium mode according to claim 8, characterized in that, in step 5, the ensemble model SSWPNN is used for prediction on the entire data set, respectively, and different results are obtained. The classification results are obtained, and then the results are weighted and integrated, and finally the action sites of zinc-binding proteins in the target sample are identified, as shown in formula (5):
CN201811353819.0A2018-11-142018-11-14 A Zinc-binding Protein Action Site Prediction MethodActiveCN109326329B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN201811353819.0ACN109326329B (en)2018-11-142018-11-14 A Zinc-binding Protein Action Site Prediction Method

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN201811353819.0ACN109326329B (en)2018-11-142018-11-14 A Zinc-binding Protein Action Site Prediction Method

Publications (2)

Publication NumberPublication Date
CN109326329Atrue CN109326329A (en)2019-02-12
CN109326329B CN109326329B (en)2020-07-07

Family

ID=65257207

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN201811353819.0AActiveCN109326329B (en)2018-11-142018-11-14 A Zinc-binding Protein Action Site Prediction Method

Country Status (1)

CountryLink
CN (1)CN109326329B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109979525A (en)*2019-02-282019-07-05天津大学Improved hormonebinding protein qualitative classification method
CN110689920A (en)*2019-09-182020-01-14上海交通大学 A deep learning-based protein-ligand binding site prediction algorithm
CN111916148A (en)*2020-08-132020-11-10中国计量大学Method for predicting protein interaction
WO2024243799A1 (en)*2023-05-302024-12-05深圳先进技术研究院Enzyme kinetics parameter prediction method, and electronic device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104992079A (en)*2015-06-292015-10-21南京理工大学Sampling learning based protein-ligand binding site prediction method
CN106250718A (en)*2016-07-292016-12-21於铉N based on individually balanced Boosting algorithm1methylate adenosine site estimation method
CN107194207A (en)*2017-06-262017-09-22南京理工大学Protein ligands binding site estimation method based on granularity support vector machine ensembles
CN107273714A (en)*2017-06-072017-10-20南京理工大学The ATP binding site estimation methods of conjugated protein sequence and structural information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN104992079A (en)*2015-06-292015-10-21南京理工大学Sampling learning based protein-ligand binding site prediction method
CN106250718A (en)*2016-07-292016-12-21於铉N based on individually balanced Boosting algorithm1methylate adenosine site estimation method
CN106250718B (en)*2016-07-292018-03-02於铉N based on individually balanced Boosting algorithms1Methylate adenosine site estimation method
CN107273714A (en)*2017-06-072017-10-20南京理工大学The ATP binding site estimation methods of conjugated protein sequence and structural information
CN107194207A (en)*2017-06-262017-09-22南京理工大学Protein ligands binding site estimation method based on granularity support vector machine ensembles

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
朱非易: ""基于不平衡学习的蛋白质_维生素绑定位点预测研究"", 《中国优秀硕士学位论文全文数据库 基础科学辑》*
马军伟: ""基于机器学习方法的蛋白质亚细胞定位预测研究"", 《中国博士学位论文全文数据库 基础科学辑》*
魏志森: ""蛋白质相互作用位点预测方法研究"", 《中国博士学位论文全文数据库 基础科学辑》*

Cited By (6)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN109979525A (en)*2019-02-282019-07-05天津大学Improved hormonebinding protein qualitative classification method
CN110689920A (en)*2019-09-182020-01-14上海交通大学 A deep learning-based protein-ligand binding site prediction algorithm
CN110689920B (en)*2019-09-182022-02-11上海交通大学 A protein-ligand binding site prediction method based on deep learning
CN111916148A (en)*2020-08-132020-11-10中国计量大学Method for predicting protein interaction
CN111916148B (en)*2020-08-132023-01-31中国计量大学 Prediction methods for protein interactions
WO2024243799A1 (en)*2023-05-302024-12-05深圳先进技术研究院Enzyme kinetics parameter prediction method, and electronic device and storage medium

Also Published As

Publication numberPublication date
CN109326329B (en)2020-07-07

Similar Documents

PublicationPublication DateTitle
US12040094B2 (en)Artificial intelligence-based methods for early drug discovery and related training methods
US9195949B2 (en)Data analysis and predictive systems and related methodologies
KR102213670B1 (en)Method for prediction of drug-target interactions
CN109326329A (en) An ensemble learning-based method for predicting the action sites of zinc-binding proteins in a non-equilibrium model
US20120221501A1 (en)Molecular property modeling using ranking
MamaniMachine Learning techniques and Polygenic Risk Score application to prediction genetic diseases
SimonGenomic clinical trials and predictive medicine
Huang et al.A comparative study of discriminating human heart failure etiology using gene expression profiles
Zhao et al.Whale optimized mixed kernel function of support vector machine for colorectal cancer diagnosis
Zhang et al.DeepPRObind: modular deep learner that accurately predicts structure and disorder-annotated protein binding residues
Phan et al.Functional genomics and proteomics in the clinical neurosciences: data mining and bioinformatics
CN113921094B (en) Prediction model of anti-HBV small molecule drugs based on representation learning and its construction method
Li et al.EfficientNet-resDDSC: A Hybrid Deep Learning Model Integrating Residual Blocks and Dilated Convolutions for Inferring Gene Causality in Single-Cell Data
WO2008007630A1 (en)Method of searching for protein and apparatus therefor
US20130218581A1 (en)Stratifying patient populations through characterization of disease-driving signaling
ShahrjooiHaghighi et al.Ensemble feature selection for biomarker discovery in mass spectrometry-based metabolomics
Kalya et al.Machine Learning based Survival Group Prediction in Glioblastoma
Maddalena et al.A framework based on metabolic networks and biomedical images data to discriminate glioma grades
Li et al.A novel prediction method for zinc-binding sites in proteins by an ensemble of SVM and sample-weighted probabilistic neural network
Abraham et al.CWAOMT: Class weight balanced artificial neural network model for the classification of ovarian malignancy from transcriptomic profiles
BasavaRevolutionizing Personalized Cancer Vaccines with NEO: Novel Epitope Optimization Using an Aggregated Feed Forward and Recurrent Neural Network with LSTM Architecture
Akhavan-Safar et al.Colorectal cancer driver gene detection in human gene regulatory network using an independent cascade diffusion model
Kameswari et al.A Tumor Classification Algorithm Utilizing Extreme Gradient Boosting
Feng et al.Statistical considerations in combining biomarkers for disease classification
CN120452555B (en) MHC-presented peptide prediction method and system based on multimodal deep learning

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp