- Notifications
You must be signed in to change notification settings - Fork26
3D hotspot mutation proximity analysis tool
License
ding-lab/hotspot3d
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This 3D proximity tool can be used to identify mutation hotspots from linear protein sequence and correlate the hotspots with known or potentially interacting domains, mutations, or drugs. Mutation-mutation and mutation-drug clusters can also be identified and viewed.
Program: HotSpot3D - 3D mutation proximity analysis program. Stable: v0.6.0 Beta: up to v1.8.2 Author: Beifang Niu, John Wallis, Adam D Scott, Sohini Sengupta, & Amila Weerasinghe
Usage: hotspot3d [options]
Preprocessing drugport -- 0) Parse drugport database (OPTIONAL) uppro -- 1) Update proximity files prep -- 2) Run preprocessing steps 2a-2f calroi -- 2a) Generate region of interest (ROI) information statis -- 2b) Calculate p_values for pairs of mutations anno -- 2c) Add region of interest (ROI) annotation trans -- 2d) Add transcript annotation cosmic -- 2e) Add COSMIC annotation to proximity file prior -- 2f) Prioritization Analysis main -- Run analysis steps a-f (beta) search -- a) 3D mutation proximity searching cluster -- b) Determine mutation-mutation and mutation-drug clusters sigclus -- c) Determine significance of clusters (BETA/OPTIONAL) summary -- d) Summarize clusters (OPTIONAL) visual -- e) Visulization of 3D proximity (OPTIONAL)
For user support please emailruiyang.liu@wustl.edu &bniu@sccas.cn
To reinstall code of the same version (in some cases, may need --sudo):
cpanm --reinstall HotSpot3D-#.tar.gz
Make sure that you have cpanm:
cpan App::cpanminus
For configuration, we recommend using local::lib:
cpanm --local-lib=~/perl5 local::lib && eval $(perl -I ~/perl5/lib/perl5/ -Mlocal::lib)
Dependencies include the modules: LWP::Simple, Test::Most, List::Util, List::MoreUtils, Parallel::ForkManager
cpanm LWP::Simplecpanm Test::Mostcpanm List::Utilcpanm List::MoreUtilscpanm Parallel::ForkManager
Install HotSpot3D package:
git clone https://github.com/ding-lab/hotspot3dcd hotspot3d
For the latest stable version:
git checkout v0.6.0cpanm HotSpot3D-0.6.0.tar.gz
For the latest beta version:
git pull origin v1.8.2cpanm HotSpot3D-1.8.2.tar.gz
Final note: Installations under some organizations may use an internal perl version.To make use of the /usr/ perl, edit the first line of ~/perl5/bin/hotspot3d.
from: #!/org/bin/perlto: #!/usr/bin/perl
We have developed webservice for HotSpot3D, and you can submit your mutation file to doanalysis online here:http://niulab.scgrid.cn/HotSpot3D/ , once you don't want to installHotSpot3D locally.
It is helpful to add your perl5 lib directory, and to add your perl5 bin directory.You can add the following lines to your ~/.bash_profile. Then run 'source ~/.bash_profile'.export PERL5LIB=~/perl5/lib/perl5/:${PERL5LIB}export PERL5BIN=~/perl5/bin/:${PERL5BIN}export PATH=~/perl5/bin/:${PATH}Add cosmic v67 information to 3D proximity results :mkdir preprocessing_dir/cosmiccp COSMIC/cosmic_67_for_HotSpot3D_missense_only.tsv.bz2 ./preprocessing_dir/cosmic/cd ./preprocessing_dir/cosmic/ bzip2 -d cosmic_67_for_HotSpot3D_missense_only.tsv.bz2
Go tohttps://www.synapse.org/#!Synapse:syn8699796, and check out the wiki for any updates/details.
Select the Files tab, then go into the AverageResidueDistance data directory (syn8717211).
The DrugPort processing results are located here (syn9704835) for those interested.
Select the reference/assembly version of interest (GRCh37 with Ensembl version 74 (syn9701918), or GRCh38 with Ensembl version 87 (syn9704851)).
You will need to download the hugo.uniprot.pdb.transcript.csv (syn9704852).
Two download options are available, prioritization.tar.gz (syn9704853) contains all human proteins that have been preprocessed. This is a large file that can take an hour or more depending on internet speeds. Alternatively, you can download the prioritization/ (syn9705109) or any specific protein proximity files within. The proximity files are compressed for faster/more targeted downloading.
NOTE: Proximity data only contains pairs within 20Angstroms between mutations. This should be sufficient for many HotSpot3D applications.
(Optional) Run drugport module to parse Drugport data and generate a drugport parsing results flat file :
hotspot3d drugport --pdb-file-dir=pdb_files_dir
Run 3D proximity calculation that also updates any existing preprocessed data (default launches LSF jobs) :
hotspot3d uppro --output-dir=preprocessing_dir --pdb-file-dir=pdb_files_dir --drugport-file=drugport_parsing_results_file 1>hotspot3d.uppro.err 2>hotspot3d.uppro.out
Run automated preprocessing for other measurments and annotations (can alternatively run steps 2a-2f individually) :
hotspot3d prep --output-dir=preprocessing_dir
3D proximity searching based on prioritization results and visualization
Proximity searching (acquire proximity information for input mutations):
hotspot3d search --maf-file=your.maf --prep-dir=preprocessing_dir
Cluster pairwise data:
hotspot3d cluster --pairwise-file=3D_Proximity.pairwise --maf-file=your.maf
Cluster significance calculation:
hotspot3d sigclus --prep-dir=preprocessing_dir --pairwise-file=3D_Proximity.pairwise --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters
Clustering Summary:
hotspot3d summary --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters
Visualization (works with PyMol):
hotspot3d visual --pairwise-file=3D_Proximity.pairwise --clusters-file=3D_Proximity.pairwise.singleprotein.collapsed.clusters --pdb=3XSR
Check out scripts/ for various annotation scripts to add more details to the .clusters file.
HGNC download can be found here:http://www.genenames.org/cgi-bin/genefamilies/.
Information on the Ensembl .gtf can be found here:http://useast.ensembl.org/info/website/upload/gff.html, and downloads can be found at the Ensembl ftp site, ftp://ftp.ensembl.org/pub/.
See the scripts/README.annotations for more details.
Mutation file - Standard .maf with custom coding transcript and protein annotations (ENST00000275493 and p.L858R)
There are only a handful of columns necessary from .maf files. They are:
Hugo_SymbolChromosomeStart_PositionEnd_PositionVariant_ClassificationReference_AlleleTumor_Seq_Allele1Tumor_Seq_Allele2Tumor_Sample_Barcode
And two non-standard columns:
a transcript ID columna protein peptide change column (HGVS p. single letter abbreviations, ie p.T790M)
Current Annotation Support:
Transcript ID - Ensembl coding transcript ID's (ENST)Gene name - HUGO symbol
Clustering with different pairs data:
For monomers, you need to include the option '--meric-type monomer'For homomers, you need to include the option '--meric-type homomer'For heteromers, you need to include the option '--meric-type heteromer'For both homomers & heteromers simultaneously, you need to include the option '--meric-type multimer'For no regard to *mer status, you can include the option '--meric-type unspecified', although this is run by default without the optionFor DrugPort only, do not input the .pairwise file; input only DrugPort pairs file.For *mer+DrugPort include the .pairwise file with the DrugPort pairs file, and include the appropriate --meric-type as described above.
Clustering based on different distance measures:
There are some pairs found on multiple structures. In HotSpot3D versions v0.6.2 and earlier, clustering only used the shortest distance among different structures (shortest structure distance, SSD). In HotSpot3D versions v0.6.3 and later, clustering can be done using the average distance among different structures (average structure distance, ASD), and this is now default.
If you use HotSpot3D in your research, please cite:
- Protein-structure-guided discovery of functional mutations across 19 cancer types; Niu B, Scott AD, Sengupta S, Bailey MH, Batra P, Ning J, Wyczalkowski MA, Liang WW, Zhang Q, McLellan MD, Sun SQ, Tripathi P, Lou C, Ye K, Mashl RJ, Wallis J, Wendl MC, Chen F, Ding L; Nat Genet 2016 Aug;48(8):827-37
About
3D hotspot mutation proximity analysis tool