ipyrad-analysis toolkit: mrBayes

Bayesian phylogenetic inference can provide several advantages over ML approaches, particularly with regards to inferring dated trees (separating rate and time), and because they assess support differently, using a posterior sample of trees inferred from the full data set as opposed to bootstrap resampling of the data set. This may be particularly relevant to RAD-seq data that contains missing data.

Bayesian inference programs often contain waaay too many options, which make them difficult to use. But it’s kind of necessary in order to make informed decisions when setting priors on parameters, as opposed to treating the analysis like a black box. That being said, I’ve written a sort of blackbox tool for running mrbayes analyses. Once you’ve established a set of parameters that are best for your analysis this tool is useful to then automate it across many loci (e.g., see ipa.treeslider()).

Required software

[1]:
# conda install ipyrad -c bioconda
# conda install mrbayes -c bioconda
# conda install toytree -c eaton-lab
[2]:
import ipyrad.analysis as ipa
import toytree

Short Tutorial:

This takes about 3-5 minutes to run on my laptop for a data set with 13 samples and ~1.2M SNPs and about 14% missing data.

[3]:
# the path to your NEXUS formatted file
nexfile = "../min10_outfiles/min10.nex"
nexfile = "/home/deren/Documents/ipyrad/tests/pedicularis/data10_outfiles/data10.nex"

[7]:
# init mrbayes object with input data and (optional) parameter options
mb = ipa.mrbayes(
    data=nexfile,
    clock=True,
    ngen=1000,
    samplefreq=100,
    nruns=1,
)

# print the mb nexus string; this can be modified by changing .params
print(mb.nexus_string)
#NEXUS
execute /home/deren/Documents/ipyrad/tests/pedicularis/data10_outfiles/data10.nex;

begin mrbayes;
set autoclose=yes nowarn=yes;

lset nst=6 rates=gamma;

prset clockratepr=lognorm(-7,0.6);
prset clockvarpr=tk02;
prset tk02varpr=exp(1.0);
prset brlenspr=clock:birthdeath;
prset samplestrat=diversity;
prset sampleprob=0.1;
prset speciationpr=exp(10);
prset extinctionpr=beta(2, 200);
prset treeagepr=offsetexp(1,5);

mcmcp ngen=1000 nrun=1 nchains=4;
mcmcp relburnin=yes burninfrac=0.25;
mcmcp samplefreq=100;
mcmcp printfreq=10000 diagnfr=5000;
mcmcp filename=/home/deren/Documents/ipyrad/newdocs/cookbook/analysis-mb/test.nex;
mcmc;

sump filename=/home/deren/Documents/ipyrad/newdocs/cookbook/analysis-mb/test.nex;
sumt filename=/home/deren/Documents/ipyrad/newdocs/cookbook/analysis-mb/test.nex;
end;

[8]:
# run the command, (options: block until finishes; overwrite existing)
mb.run(block=True, force=True)
job test finished successfully
[19]:
# (optional) draw your tree in the notebook
import toytree

# load from the .trees attribute or from the saved tree file
tre = toytree.tree(mb.trees.constre, tree_format=10)

# draw the tree
tre.draw(tip_labels_align=True);
39618_rex38362_rex35236_rex40578_rex35855_rex30556_thamno33413_thamno41954_cyathophylloides41478_cyathophylloides30686_cyathophylla29154_superba33588_przewalskii32082_przewalskii

Longer tutorial

By default several parameters are pre-set in the raxml object. To remove those parameters from the command string you can set them to None. Additionally, you can build complex raxml command line strings by adding almost any parameter to the raxml object init, like below. You probably can’t do everythin in raxml using this tool, it’s only meant as a convenience. You can always of course just write the raxml command line string by hand instead.

[ ]:

Cookbook

Most frequently used: perform 100 rapid bootstrap analyses followed by 10 rapid hill-climbing ML searches from random starting trees under the GTRGAMMA substitution model.

[ ]:

What’s next?

If you have reference mapped data then you should see the .treeslider() tool to infer trees in sliding windows along scaffolds; or the .extractor() tool to extract, filter, and concatenate RAD loci within a given window (e.g., near some known gene); or the .locus_optimizer() function to build longer loci for gene tree analyses by concatenating loci that are within N basepairs of each other on a scaffold.