Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
NotificationsYou must be signed in to change notification settings

bzhanglab/DeepVEP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepVEP: mutation impact prediction on post-translational modifications using deep learning

Table of contents:

Installation

Download DeepVEP

$ git clone https://github.com/bzhanglab/DeepVEP

Installation

DeepVEP is a python3 package. TensorFlow (>=2.6) is supported. Its dependencies can be installed via

$ pip install -r requirements.txt

DeepVEP has been tested on both Linux and Windows systems. It supports training and prediction on both CPU and GPU.

Download model files

The pretrained model files are available atDeepVEP model repository. After the model files are downloaded, decompress the *.tar.gz file and move all the files to themodels folder as shown below:

├── README.md├── deepvep.py├── lib│   ├── DataIO.py│   ├── Metrics.py│   ├── ModelView.py│   ├── MutationUtils.py│   ├── PTModels.py│   ├── PeptideEncode.py│   ├── RegCallback.py│   ├── Utils.py│   └── __init__.py├── models└── requirements.txt

Usage

Run the following command line to show all command line options:

python deepmp.py predict -h
  -h or --help            show this help message and exit  -i or --input           Input data for prediction  -d or --db              Protein database  -o or --out_dir         Output directory  -w or --window_size     Window size for mutation impact prediction. In default, 7 amino acids in both side of a mutation.  -m or --model           Trained model path  -t or --task            Prediction type: 1=Mutation impact prediction, 2=PTM site prediction  -e or --ensemble        Ensemble method, 1: average, 2: meta_lr, default is 1.  -s or --explain_model   Perform model interpretability analysis  -b or --bg_data         Data used as background data in model interpretability

Mutation impact prediction:

Below is an example for mutation impact prediction. The input for-i is aTSV format file which contains mutation information. The input for-d is a protein database file in FASTA format which contains the protein sequences for the wild type of proteins.

python deepvep.py predict -m models/ -i example/mutation_input.tsv -d example/Q5S007.fasta -t 1 -o mutation_output_folder

The required columns for input of-i includeProtein,AA_Ref,AA_Pos andAA_Var. An example ("example/mutation_input.tsv") is shown below:

Protein  AA_Ref  AA_Pos  AA_VarQ5S007   R       1441    CQ5S007   R       1441    H

Below please find the description of each column in the output file "example/mutation_input.tsv":

Column nameDescription
Proteinthe protein name in the input fasta file
AA_Refthe wild type amino acid
AA_Posthe mutation position on the protein (1-based: the position of the first amino acid on the protein is 1)
AA_Varthe mutation amino acid

For the above example input (-i), the wild type protein sequence of proteinQ5S007 should be present in the input protein database for-d. The first 10 lines of the file "example/Q5S007.fasta" are shown below:

>Q5S007MASGSCQGCEEDEETLKKLIVRLNNVQEGKQIETLVQILEDLLVFTYSERASKLFQGKNIHVPLLIVLDSYMRVASVQQVGWSLLCKLIEVCPGTMQSLMGPQDVGNDWEVLGVHQLILKMLTVHNASVNLSVIGLKTLDLLLTSGKITLLILDEESDIFMLIFDAMHSFPANDEVQKLGCKALHVLFERVSEEQLTEFVENKDYMILLSALTNFKDEEEIVLHVLHCLHSLAIPCNNVEVLMSGNVRCYNIVVEAMKAFPMSERIQEVSCCLLHRLTLGNFFNILVLNEVHEFVVKAVQQYPENAALQISALSCLALLTETIFLNQDLEEKNENQENDDEGEEDKLFWLEACYKALTWHRKNKHVQEAACWALNNLLMYQNSLHEKIGDEDGHFPAHREVMLSMLMHSSSKEVFQASANALSTLLEQNVNFRKILLSKGIHLNVLELMQKHIHSPEVAESGCKMLNHLFEGSNTSLDIMAAVVPKILTVMKRHETSLPVQLEALRAILHFIVPGMPEESREDTEFHHKLNMVKKQCFKN

The output folder ("mutation_output_folder") of the example command line looks like below:

mutation_output_folder├── deepvep-mutation_impact.tsv├── acetylation_k├── glycosylation_n├── methylation_k├── methylation_r├── phosphorylation_st├── phosphorylation_y├── sumoylation_k└── ubiquitination_k

The output file "mutation_output_folder/deepvep-mutation_impact.tsv" contains the predicted mutation impact on all the PTM sites supported by DeepVEP. This is the only file that users need to use for downstream analysis. The other folders contain intermediate prediction files.

Protein  AA_Ref  AA_Pos  AA_Var  pos   diff_pos  w_pep            m_pep            w_prob      m_prob      delta_prob   ptmQ5S007   R       1441    C       1443  -2        FNIKARASSSPVILV  FNIKACASSSPVILV  0.7837325   0.2552052   -0.5285273   phosphorylation_stQ5S007   R       1441    C       1444  -3        NIKARASSSPVILVG  NIKACASSSPVILVG  0.88949174  0.43282443  -0.45666731  phosphorylation_stQ5S007   R       1441    C       1445  -4        IKARASSSPVILVGT  IKACASSSPVILVGT  0.8771168   0.49486965  -0.38224715  phosphorylation_stQ5S007   R       1441    H       1443  -2        FNIKARASSSPVILV  FNIKAHASSSPVILV  0.7837325   0.26106784  -0.52266466  phosphorylation_stQ5S007   R       1441    H       1444  -3        NIKARASSSPVILVG  NIKAHASSSPVILVG  0.88949174  0.44303632  -0.44645542  phosphorylation_stQ5S007   R       1441    H       1445  -4        IKARASSSPVILVGT  IKAHASSSPVILVGT  0.8771168   0.48524436  -0.39187244  phosphorylation_st

Below please find the description of each column in the output file "output_folder/site_prediction.tsv":

Column nameDescription
Proteinthe protein name in the input fasta file
AA_Refthe wild type amino acid
AA_Posthe mutation position on the protein (1-based: the position of the first amino acid on the protein is 1)
AA_Varthe mutation amino acid
posthe PTM site on the protein (1-based: the position of the first amino acid on the protein is 1)
diff_posthe distance between the PTM site and the mutation site
w_pepthe wild type peptide sequence in which the center is PTM site
m_pepthe mutant peptide sequence in which the center is PTM site
w_probpredicted PTM site probability for wild type sequence
m_probpredicted PTM site probability for mutant sequence
delta_probmutation impact on the PTM site: m_prob - w_prob
ptmPTM name

The prediction took less than 3 minutes using CPU on a Linux server (64G RAM and 16 CPUs).

PTM site prediction:

Below is an example for PTM site prediction. The input (-d) is a protein database file in FASTA format which contains the protein sequences to predict. The following command line is used to predict all PTM sites supported by DeepVEP.

python deepvep.py predict -m models/ -d example/Q5S007.fasta -t 2 -o output_folder

The output folder ("output_folder") of the example command line looks like below:

output_folder/├── site_prediction.tsv├── acetylation_k├── glycosylation_n├── methylation_k├── methylation_r├── phosphorylation_st├── phosphorylation_y├── sumoylation_k└── ubiquitination_k

The output file "output_folder/site_prediction.tsv" contains all the predicted PTM sites. This is the only file that users need to use for downstream analysis. The other folders contain intermediate prediction files.

protein  aa  pos   x                                y_pred          fpr                 ptmQ5S007   K   17    ASGSCQGCEEDEETLKKLIVRLNNVQEGKQI  0.99253386      0.0027506112469437  acetylation_kQ5S007   K   18    SGSCQGCEEDEETLKKLIVRLNNVQEGKQIE  0.54781103      0.1937652811735941  acetylation_kQ5S007   K   30    TLKKLIVRLNNVQEGKQIETLVQILEDLLVF  0.0013190061    0.9529339853300732  acetylation_kQ5S007   K   53    ILEDLLVFTYSERASKLFQGKNIHVPLLIVL  0.70806825      0.1253056234718826  acetylation_kQ5S007   K   58    LVFTYSERASKLFQGKNIHVPLLIVLDSYMR  0.0148314405    0.7331907090464548  acetylation_kQ5S007   K   87    MRVASVQQVGWSLLCKLIEVCPGTMQSLMGP  0.0005026354    0.9819682151589242  acetylation_kQ5S007   K   120   VGNDWEVLGVHQLILKMLTVHNASVNLSVIG  0.00056051335   0.9801344743276283  acetylation_kQ5S007   K   137   LTVHNASVNLSVIGLKTLDLLLTSGKITLLI  0.007324457     0.82059902200489    acetylation_kQ5S007   K   147   SVIGLKTLDLLLTSGKITLLILDEESDIFML  0.0003533643    0.9865525672371638  acetylation_k

Below please find the description of each column in the output file "output_folder/site_prediction.tsv":

Column nameDescription
proteinthe protein name from the input fasta file
aathe amino acid PTM site to predict
posthe PTM site on the protein (1-based: the position of the first amino acid on the protein is 1)
xa peptide sequence of 31 amino acids in which the center is the predicted PTM site
y_predpredicted probability
fprfalse positive rate using the predicted probability as the threshold to define positive PTM site
ptmPTM name

The prediction took less than 3 minutes using CPU on a Linux server (64G RAM and 16 CPUs).

How to cite:

There is no a manuscript to cite yet.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp