You signed in with another tab or window.Reload to refresh your session.You signed out in another tab or window.Reload to refresh your session.You switched accounts on another tab or window.Reload to refresh your session.Dismiss alert
The process ties together several useful tools, which could almost as easily be done manually in case you have trouble installing the pipeline.
These are:
Generating ancestral repeats on the ancestral genome
A fasta is extracted and RepeatMasker is run on the specified ancestral genome (usually this is the root of the subtree you are interested in). As a postprocessing step, tRNAs and low complexity repeats are filtered out and the .out file is converted to a BED file containing the other repeats.
halPhyloPTrain.py
This trains the PHAST model on a randomly sampled set of bases. (There are diminishing returns past tens of thousands of bases, depending on the size of the alignment.) Internally halPhyloPTrain.py is a very simple combination ofhal2maf andphyloFit. We suggest using theREV model.
Rescaling
You may be interested in the difference in rate between different sets of chromosomes (say sex vs autosomal chromosomes). There is an option within the pipeline to rescale the model to reflect the overall rate in different chromosomes. I instead suggest usinghalLiftover to lift the ancestral repeats to a genome with defined chromosomes, filtering the lifted BED to create separate BEDs for each set of interest, then usingphyloFit to infer a new model for each set like so: