- Notifications
You must be signed in to change notification settings - Fork3
Estimate PTM hotspots in protein sequence alignments
License
evocellnet/ptm_hotspots
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Method to infer PTM (currently only phosphorylation) hotspots in conserved protein alignments based on (Strumilloet al.bioRxiv).
ptm_hotspot
requires Python (v3) as well as the following packages:
- numpy
- pandas
- scipy
- statsmodels
If you want to predict all hotspots in all protein domains just run:
python3 ptm_hotspots.py -o predicted_hotspots.csv
To obtain particular domain predictions:
python3 ptm_hotspots.py -o kinase_domain_hotspots.csv -d PF00069
To obtain hotspot residue predictions instead of hotspot ranges
python3 ptm_hotspots.py -o hotspot_residues.csv --printSites
You can obtain more help and options by typing:
python3 ptm_hotspots.py -h
usage: ptm_hotspots.py [-h] [--dir [PATH]] [--ptmfile [PATH]] [-d [PFXXXXX]] [--iter [INTEGER]] [--threshold [FLOAT]] [--foreground [FLOAT]] -o PATH [--printSites]Estimate PTM hotspots in sequence alignmentsoptional arguments: -h, --help show this help message and exit --dir [PATH] fasta alignments dir (default: db/alignments) --ptmfile [PATH] file containing PTMs (default: db/all_phosps) -d [PFXXXXX], --domain [PFXXXXX]query single domain (i.e. PF00069) --iter [INTEGER] number of permutations (default: 100) --threshold [FLOAT] Corrected p-value threshold (default: 0.01) --foreground [FLOAT] effect-size foreground cutoff (default: 2) -o PATH, --out PATH output csv file --printSites print all residue predictions
Note: Since the Bonferroni correction depends on the total number of predictions, small disimilarities might emerge in the same domain hotspots depending on whether you run only a domain or the full set of domains. Similarly, the stochastic nature of the permutation analysis might make the results vary between runs.
By defaultptm_hotspot
uses a database containing precalculated domain alignments (as described in Strumillo et al.) as well as a collection of phosphorylated residues derived from public high-throughput mass spectrometry experiments. In order to update the database please consider the next points:
Every alignment file should be in FASTA format and the header should contain the start and the end of the domains in the alignment coordinates separated by ";". For example:
>EDP05298 pep:known supercontig:v3.1:DS4 ;51;337
For full protein predictions just include the first and last positions in the multiple sequence alignment.
The ptm database should be included as acsv
file containing id, amino acid and position of the phosphosite within the protein.
- Strumillo, M. J., Oplova, M., Viéitez, C., Ochoa, D., Shahraz, M., Busby, B. P., et al. Sopko, M., Studer, R. A., Perrimon, N., Panse, V. G., Beltrao, P. (2018). Conserved phosphorylation hotspots in eukaryotic protein domain families. bioRxiv.https://doi.org/10.1101/391185