Release Notes


April 15, 2021

  • demultiplex: Properly handle 3rad barcodes of different length


April 01, 2021

  • Allow snps_extractor to handle snps.hdf5 files with names not encoded as bytes
  • Fixing mismatch of #SBATCH and command parameters (#440)


March 19, 2021

  • ipa.structure: handle edge case with running more jobs with different K values


February 23, 2021

  • Handle i7 demux to strip trailing newline from barcode
  • Allow very short input fq files.
  • Fix project_dir inconsistency in merged assemblies.
  • Raise an error if setting a bad parameter in API mode #354


February 21, 2021

  • Don’t blank extra p3/p5 adapters in step2.check_adapters if filter_adapters != 3.


February 18, 2021

  • analysis.popgen: _collect_results() function to compile and organize results from the engines. It’s a little ugly.
  • analysis.popgen Change default minmap value to 4
  • analysis.popgen Chunking of loci and parallelization of analysis working.
  • analysis.popgen._fst_full() implemented. Had to slightly tweak the code that was there, and it works a bit different than the other sumstats, but whatever for now it works.
  • analysis.popgen implemented Dxy
  • analysis.popgen._process_locus_pops() to prepare data for pairwise summary statistics
  • analysis.popgen: Stats returned as dict instead of tuple, and remote processes dump full stats to pickle files.
  • Add a _process_locus function to remove a bunch of redundancy, and split consens bases to account for diploidy in all the calcs
  • analysis.popgen: If no imap then all samples are considered from the same population.
  • analysis.popgen: refactor _Watterson to accept sequence length and return raw and per base theta values.
  • analysis.popgen: Refactor into ‘public’ sumstat methods which can be called on array-like locus data, and ‘semi-private’ methods (e.g. _pi, or _Watterson) that are much more efficient but accept very specific information. The function will do some housekeeping per locus and call the ‘semi-private’ methods, for efficiency. Public methods are for testing.
  • analysis.popgen Tajima’s D function
  • analysis.popgen Watterson’s theta function
  • clustmap: Handle bam files with very large chromosomes (#435)
  • clustmap_across: Catch samtools indexing failure for large bam files and try again with .csi indexing with the flag.
  • Adopting the Processor() structure to enable parallelization. Also implemented nucleotide diversity.
  • analysis.popgen add parallelization support for calculating all sumstats per locus.
  • analysis.popgen _fst() is working for snp data
  • Add an option to LocusExtracter.get_locus to return a pd.DataFrame indexed by sample names
  • analysis.popgen getting it into current analysis tools format and starting to flesh it out.
  • CLI honors the -q flag now
  • analysis.sharing allow subsampling inside draw() to prevent recalculating data
  • analysis.shared allow sorting by mean pairwise sharing or missing
  • Allow analysis.sharing to reorder samples, and add a progress bar
  • Add new pairwise snp/missingness sharing analysis tool.
  • add locus sharing/missingness sharing analysis tool skel
  • pin numpy >=1.15, a known working version to address #429
  • Fix weird race condition with branching and pop_assign_files. See #430.


January 22, 2021

  • Fix #429 weird masking bug in older versions of numpy.
  • Add refs to the analysis.popgen tool.


January 16, 2021

  • replaced core.Assembly.database which actually wasn’t doing anything with snps_database and seqs_database to retain info about the hdf5 files in the assembly object
  • fix empirical api structure params format
  • Allow structure to accept vcf files and auto-convert to hdf5
  • fix oops in i7 demux cookbook


December 17, 2020

  • Fix off-by-one error in nexus output
  • update struct testdocs
  • Add Tajima’s D denominator equation to the popgen analysis tool, because I coded it before and it’s a nightmare.
  • use quiet in lex
  • plot posteriors range limits removed
  • Actually fix pca cookbook
  • fix malformatted pca cookbook
  • raxml w/ gamma rates


November 03, 2020

  • updating mb cookbook
  • bpp transform parser bugfix
  • add bpp dev nb
  • allow non-blocking calls again in bpp
  • bpp result split on delimiter
  • allow empty node dist in bpp
  • add option to extract as string/bytes
  • snps_extracter: report maf filter as unique filtered
  • snps extracter reports n unlinked snps
  • tmp dir for new cookbook docs
  • allow maf filter in structure
  • node_dists check for tips in bpp plot
  • new plotting tools for bpp results
  • added download util
  • add minmaf filter to snps_extracter, pca
  • additionall bpp summary tools


October 15, 2020

  • Fix nasty nasty bug in step 7 for issue #422


October 13, 2020

  • Fix mapping minus reads bug step 3


September 20, 2020

  • In structure._call_structure() self is not in scope, so now we pass in the full path to the structure binary.


September 10, 2020

  • fix oops in window_extracter
  • Allow scaffold idxs to be int
  • bugfix to skip scaffs with no hits even when end=None
  • end arg offset bugfix, may have affected size of windows in treeslider
  • Add handler for malformed loci in
  • fix to allow digest of single end RAD
  • Changed astral annotation to default (#417)
  • Update dependency documentation
  • baba2 draw fix
  • return canvas in draw
  • Fix an oops iff hackersonly.declone_PCR_duplicates && reference assembly
  • Merge pull request #409 from camayal/master
  • path fix for conda envs
  • path fix for conda envs
  • update for py38
  • Update faq.rst
  • pca docs
  • Change exit code for successful merging from 1 to 0.
  • Remove the png outfile from pca because it wants ghostscript package installed and apparently toytree doesn’t have that as a dependency. Annoying.
  • Update README.rst
  • Allow pca to write png as well.
  • Added fade option for blocks and tooltips
  • Merge pull request #1 from camayal/camayal-baba2-work
  • Changes in Drawing class


August 01, 2020

  • Add functionality to pca to allow adding easy text titles to the plots with the param
  • Document writing pca figure to a file
  • Allow pca.draw() to write out pdf and svg
  • force option to remove cftable in snaq
  • fix ntaxa in phy header for lex
  • mb ipcoal doc up
  • get locus phy uppers
  • changed default nboots in baba
  • Set pandas max_colwidth=250 to allow for very long sample names.
  • Fix oops in step 7 where trim_loci 3’ R1 wasn’t being used.
  • Allow imap pops to be specified as np.ndarray in snps_extracter
  • fix path snaq log
  • Fix branching oops
  • snaq working
  • snaq
  • network updated
  • testing network analysis
  • ast conda bin path fix
  • ast conda bin path fix
  • ast conda bin


June 29, 2020

  • revise installation docs to add conda-forge recommendation to reduce conflict errors
  • update README install instructions


June 26, 2020

  • bpp with ipcoal demoup
  • update tmx tool for toytree v2
  • conversion funcs in snps_extracter
  • binder add ipcoal and CF as first env
  • ipcoal terremix docs
  • tool and docs for genotype freq conversion output
  • tool and docs for genotype freq conversion output
  • baba ipcoal notebook
  • update 2brad docs slightly
  • Add force flag to force overwrite vcf_to_hdf5 (prevent redundant conversion)
  • better prior plots and transformers
  • mb more prior options and fixed tree support
  • pca allow setting colors
  • pca cna set opacity
  • pca cna set opacity
  • pca cna set opacity
  • pca cna set opacity
  • fix to allow custom colors in pca
  • docs
  • fix py2 compat
  • wex ints
  • fix api oops in window extracter docs for scaffold_idxs
  • wex start end as ints
  • re-supporting single or multiple locs/chrom in wex
  • option to not subsample SNPs
  • add mrbayes keepdir arg to organize files
  • simplified wex
  • extracter filters invariant sites when subsampled by IMAP
  • added pca panel plot
  • subsample loci jit for snps_extractor of linked SNPS
  • major baba2 update, to replace baba eventually
  • axes label fix and figt cleanup
  • handle int chrom names


May 31, 2020

  • off-by-one to ref pos in s3 applied again here


May 19, 2020

  • Fix off by 1 error in step 3 for PE data.
  • Fix toytree documentation in baba cookbook
  • Fix py2 compat by removing trailing commas from function argument lists in a couple of anaysis tools.
  • Fix oops in handling errors during convert_outputs


May 09, 2020

  • Fix nasty off-by-one error in reference positions
  • multiple default clock models
  • multiple default clock models
  • multiple default clock models
  • multiple default clock models
  • multiple default clock models
  • multiple default clock models
  • multiple default clock models
  • wex concat name drop fix
  • ts tmpdir renamed bootsdir
  • umap learn conda instructions
  • ts: added dryrun method
  • wex: remove print debug statement
  • Fix baba cookbook docs
  • add umap option
  • added pseudocode for a further imputer in prog
  • prettier bpp plot
  • pca analysis tool passes through quiet flag to subfunctions
  • warning about missing ref only up with no ref1 or ref2
  • merge fix
  • improving ipabpp summary funcs
  • ensure conda ipcluster bin on stop
  • bpp prior checks, new ctl build for 4.0, parsing results funcs
  • Add a helpful message if merging assemblies with technical replicates beyond step 3.
  • missing import
  • Handle empty imap population in snps_extractor
  • binary fix
  • syntaxerr on quiet
  • hide toytree dep in ast
  • ast better error message
  • ip assemble shows cluster on run by default
  • show_cluster func now listens to param arg
  • big update for bpp 4.0, uses lex
  • wex and ts both use idxs in param name now
  • simple astral run tool
  • lex: imap/minmap filtering fix
  • wex: imap/minmap filtering fix
  • fixed warning message
  • default minmap to 0 if imap and minmap empty
  • hide toyplot dependency
  • simple option to keep treefiles in treeslider
  • under the hood mods to pca draw func to make it more atomic
  • Update faq.rst
  • Set default filter_adapters parameter to 2
  • raise warning if ref+ or ref- and method not ref
  • notes on window extracter


April 17, 2020

  • 1 index POS in vcf output
  • minmap default is 0
  • bugfix: apply imapdrop only when imap
  • faster extraction and mincov after minmap in lex
  • mincov applies after minmap in wex
  • scaff arg entered later in cov tool
  • rmincov added to ts
  • option to keep all tmp files in treeslider
  • major fix to names sorting in wex
  • names offset by scaff length in cov plot
  • set default inner mate to 500 and use it unless user changes to None, in which case we estimate from reads
  • tmp working baba update
  • added locus extracter
  • option to keep all files in treeslider
  • added cov plot tool


April 05, 2020

  • Actually fix FASTQ+64 problem. Max fastq_qmax is 126, so this is set to 93 now (93+33=126)


April 02, 2020

  • Allow high fastq_qmax in pair merging to allow FASTQ+64 data


April 01, 2020

  • Record refseq mapped/unmapped for both SE & PE
  • wextract minmap+consred minmap default added
  • treeslider default args typed
  • tested working wextracter
  • baba merge
  • new dict for translation
  • updating bpp for 4.0


March 24, 2020

  • Fix snpstring length oops in .alleles outputs so they line up right.


March 24, 2020

  • Fix pd.as_matrix() call which is deprecated.
  • Force pca.draw() to honor the length of the color list if it is sufficiently long to color all samples in the imap, or at least use the length of the color list to set the value of the variable.
  • Fix oops in for importing msprime. Pushed it to Sim.__init__, since if you want to do baba, and don’t care about sims, then you shouldn’t have to install msprime.
  • h5py warning fix
  • use _pnames to use filtered names in run()


March 08, 2020

  • Allow more flexibility in sorted fastqs directory (DO NOT DELETE if it points to projectdir + _fastqs)
  • window extracter fix for multiple loci w/ reduce


March 04, 2020

  • Fix the treemix output so it actually generates. Took WAYYYY longer than I thought it would.
  • Update faq.rst
  • Update faq.rst


February 26, 2020

  • Fix off by 2 error in minsamp when using reference sequence
  • window extacter working for denovo loci
  • Cleaning up a TON of sphinx warnings from the docs and fixing a bunch of docs issues.
  • fix oops in (import sys)


February 19, 2020

  • Fix oops in step 6 which was leaving bad sample names hanging after alignment.


February 18, 2020

  • Set value when routing around hierarchical clustering for ref based assemblies.
  • disable hierarchical clustering until further testing
  • split samples evenly among cgroups for hierarch clust
  • digest genomes uses qual score B instead of b


February 16, 2020

  • subsample loci func added
  • counts rm duplicates in denovo and works with step6 skipping alignment of loci w/ dups
  • denovo paired aligned separately again
  • fastq qmax error in merge denovo fixed


February 15, 2020

  • Why can’t i figure out how to comment out this plotting code right? wtf!


February 15, 2020

  • commented out the import of the baba_plot plotting function and the baba.plot() method as these are broken rn, and also the plotting/baba_plotting routine tries to access toyplot in a way that breaks the conda build since toyplot isn’t a strict requirement. We could fix this in the future, but i’m tring to get the bioconda package to build successfully rn.


February 15, 2020

  • fix import checking for


February 15, 2020

  • Handle external imports in the baba module in the same way as the other analysis tools to fix the broken bioconda build.
  • Add a pops file to the ipsimdata.tar.gz because it’s always useful.
  • “Updating ipyrad/ to version - 0.9.35


February 12, 2020


February 12, 2020

  • Fix a bug in step 5 handling of RemoteError during indexing alleles.
  • Report debug traceback for all crashes, not just API. This is essentially making the debug flag useless in v.0.9


February 09, 2020

  • Roll back baba code to 0.7 version which doesn’t use the current analysis format, but which still works. Saved ongoing baba code as


February 06, 2020

  • Fix major oops in consens_se which failed step 5 every time. Bad!
  • In step 6 use the sample.files.consens info, rather than data.dirs to allow for merging assemblies after step 5 where data.dirs is invalid/empty.


February 04, 2020

  • #392 allow scaffold names to be int
  • Add sensible error handling if only a few samples fail step 5.
  • add docs to clustmap_across
  • fix for name re-ordering in window-extracter with multiple regions selected
  • added comments
  • added comments
  • added sys
  • Actually handle failed samples in step 2.
  • fix for new h5py warning
  • fix for new sklearn warning


January 19, 2020

  • Fix error in bucky (progressbar hell).
  • Add error handling in a couple cases if run() hasn’t been called, e.g. before draw, and also add the pcs() function as a convenience.
  • Removed support for legacy argument format from and updated the docs.
  • Allow PCA() to import data as vcf.
  • Add support for importing VCF into PCA


January 16, 2020

  • Fix whoops with bucky progressbars


January 15, 2020

  • Fix bucky progressbar calls.
  • Fixed progressbar calls in
  • Conda install instructions were wrong.
  • add future print import to
  • Add releasenotes to the TOC


January 12, 2020

  • Fix to actually record releasenotes
  • Fix releasenotes in versioner script
  • Fix releasenotes


January 12, 2020

  • Fix releasenotes


January 01, 2020

  • During steps 5 & 6 honor the filter_min_trim_len parameter, which is useful in some cases (2brad).
  • In step 3, force vsearch to honor the filter_min_trim_len param, otherwise it defaults to –minseqlength 32, which can be undesirable in some cases.


December 31, 2019

  • concatedits files now write to the tmpdir, rather than edits (#378), also handle refmap samples with no reads that map to the reference, also change where edits files are pulling from during PE merging to allow for assembly merging after step 2. phew.
  • digested genome bugfix - check each fbit 0 and -1
  • digest genomes nscaffolds arg support
  • docs cookbook updates
  • new consens sampling function and support for window extracter to concatenate
  • comment about zlib


December 24, 2019

  • Fix IPyradError import


December 24, 2019

  • Regress


December 23, 2019

  • Add support for .ugeno file
  • Add support for .ustr format
  • Remove duplication of code in write_str()
  • Fix docs for output formats
  • Add back output formats documentation


December 23, 2019

  • Fix stupid bug introduced by fe8c2dfc282e177a7c18f6e2e23ef84d284a9e3f


December 18, 2019

  • Expose analysis.baba for testing
  • fasterq-dump seems to be only avail on linux
  • Fix bug in handling sample names in the pops file. re: #375.
  • Allow faidict scaffold names to be int (cast to dtype=object)


December 03, 2019

  • Fix step 6 with pop_assign_file
  • fix for empty samples after align
  • list missing as ./. in VCF (like we used to)


November 23, 2019

  • Fix oops handling missing data in vcf to hdf5
  • mb binary path bugfix
  • treeslider mb bugfix
  • treeslider mb working
  • treemix support for conda env installations
  • additional drawing options for pca
  • raxml cookbook update
  • tetrad notebook updated
  • Fix oops in checking for lowercase overhangs seqs
  • Fix a nasty stupid bug setting the overhang sequence
  • Add back the docs about merging
  • Error checking in step 5.
  • Forbid lowercase in overhang sequence


November 04, 2019

  • Ooops. Allow popsfile w/o crashing, and allow populations to be integer values
  • cookbooks added link to nb
  • pca stores results as attr instead or returning


October 31, 2019

  • commented fix of optim chunksize calc
  • treeslider now working with mb
  • toggle to write in nexus
  • mb saves convergence stats as df
  • single-end mapping infiles bugfix
  • pca cookbook update
  • added fasttree tool
  • update cookbooks index
  • cookbooks updated headers
  • warning about denovo-ref to use new param
  • clustmap keeps i5s and can do ref minus
  • window extracter updated
  • mb load existing results and bugfix result paths
  • update treemix and mb docs
  • Fix calculation of optim during step 6 aligning. 3-4x speedup on this medium sized simulated data i’m working on.
  • Fix oops in how optim was being counted. Was counting using _unsorted_ seed handle, I switched it to use sorted and now it works more like expected
  • Clean up clust.txt files after step 3 finishes
  • Update docs and parameter descriptions to reflect new reality of several params
  • Add handlers for denovo +/- reference.
  • vcf tool docs
  • tools docs update
  • enable vcf_to_hdf5
  • pca reps legend looks nice
  • added replicate clouds to pca
  • vcf to hdf5 converter tested empirically
  • Add from the hotfix branch
  • Pull from the correct repo inside meta.yaml
  • VCF 9’s fixed to be .
  • Add back tetrad docs
  • default hackers set to yes merge tech reps, and cleanup
  • behavior for duplicates in barcodes file
  • bugfix: error reporting for barcodes within n
  • find binary from env or user entered
  • find ipcluster from conda env bin
  • bugfix: allow demux i7s even if datatype=pair3rad
  • add the notebook tunnel docs back
  • pedicularis cli tutorial updated
  • df index needed sorting
  • Allow sample names to be integers. wtf, how did this never come up before?
  • Add the McCartney-Melstad ref to the faq
  • fix typo in bpp cookbook
  • Fix bpp Params import
  • docs nav bar cleanup
  • testing binder w/o treemix
  • Add advanced tutorial back (?), maybe as a placeholder.
  • Remove references to smalt and replace with bwa. That’s some old-ass junk!
  • Fixed the script and added the faq.rst to the newdocs


October 05, 2019

  • binder update
  • docs update
  • add bpp docs
  • indentation in docs
  • add i7 demux cookbook
  • mroe analysis cookbooks
  • analysis cookbooks
  • merged clustmap
  • Fix a nasty bug with stats for assemblies where chunks end up empty after filtering
  • Fix step 3 to allow some tmpchunks to be empty without raising an error during chunk aligning
  • Fix a bug in
  • Fix a nasty error in jointestimate.stackarray() where some long reads were slipping in over the maxlen length and causing a broadcast error


  • py2 bug: print missing as float
  • py2 bug fix: database ordering
  • allow iterable params object for py2 and 3
  • Fix an edge case to protect against empty chunks post-filtering during step 7
  • install docs update
  • Fix CLI so merge works
  • Fix the max_shared_Hs param description to agree with only having one value, rather than 1 value perper R1/R2
  • Ooops. checked in a pdb.set_trace in write outfiles. sorry!
  • add deps to newdocs
  • bug fix for a rare trim that leaves >=1 all-N rows. Filter it.
  • documenting a hard coded backward compatibility in write_output.Processor()
  • hdf5 formatting for window slider in both denovo and ref
  • sratools up to date with CLI working too
  • Don’t pester about mpi4py if you’re not actually using MPI (CLI mode)
  • Allow for user to not input overhang sequences and jointestimate will just proceed with the edges included.
  • chunked downloads bug fix

<sunspots cause discontinuity in version history>


March 09, 2019

  • Fix pca for scikit 1.2.0 API and a few minor fixes.
  • Update faq.rst
  • Update faq.rst


January 21, 2019

  • Fix nasty ValueError bug in step 7 (re: merged PE loci)
  • Update faq.rst
  • Update faq.rst
  • Update faq.rst
  • Adding more docs
  • Starting list of papers related to assembly parameters
  • Remove ‘skip’ flags from meta.yaml, because False is default now
  • Add funcsigs dependency
  • Fix so max locus length is autodetected from the data, instead of being fixed at 300
  • Adding a conversion script which takes in a directory of nexus alignments and writes out a .loci file. This is as stupid as possible and it makes a lot of assumptions about the data, so don’t be surprised if it doesn’t work right.
  • added missing dependency on cutadapt (#314)
  • Add support for finding bins in a virtualenv environment installed with pip
  • add missing requirement: dask[array] (#313)
  • Update faq.rst
  • Update faq.rst
  • fix branching docs
  • Fix a nasty bug in sra tools if you try to dl more than 50 or 60 samples.
  • fix dox
  • Fix references to load_assembly to point to load_json
  • Removing docs of preview mode
  • Purge references to preview mode. Clean up some deprecated code blocks in demux.
  • Remove import of util.* from load, and include only the few things it needs, remove circular dependency.
  • Add docs about structure parallel runs failing silently
  • Removing the restriction on ipyparallel version to obtain the ‘IPython cluster’ tab in notebooks.
  • Adding docs about engines that die silently on headless nodes
  • Add title and save ability to pca.plot()
  • Make pca.plot() less chatty
  • Forbid nPCs < n samples
  • Update ipyrad meta.yaml to specify ipyparallel, and scikit-allel version.
  • Fix pis docs in faq
  • Update full_tutorial_CLI.rst
  • Update full_tutorial_CLI.rst
  • Update full_tutorial_CLI.rst
  • Update full_tutorial_CLI.rst
  • Adding scikit-allel dependency for pca analysis tool
  • Update cookbook-PCA-pedicularis.ipynb
  • Fix a bug that was causing _link_fastqs to fail silently.
  • fixing inconsistencies in the pedicularis CLI tutorial
  • Big update to the PCA cookbook.


June 18, 2018

  • Add functions for missingness, trim missing, and fill missing.
  • Adding PCA cookbook
  • pcs are now stored as pandas, also, you can specify ncomps


June 15, 2018

  • Add distance plot, and pca.pcs to hold coordinates per sample
  • remove some crust from pca.pywq


June 14, 2018

  • Adding analysis.pca
  • Allow passing in just a dict for specifying populations to _link_populations(), and assume all minsamps = 0
  • Some of step 2 docs were outdated
  • Fix stupid link
  • Adding some docs about MIG-seq.
  • Damn this cluster config mayhem is a mess.
  • Fix faq re pyzmq
  • adding docs about max_snp_locus settings
  • Fix merge conflict
  • Add docs to fix the GLIBC error
  • Docs for r1/r2 not the same length


May 17, 2018

  • nb showing fix for 6-7 branching
  • nb showing fix for 6-7 branching
  • fixed branching between 6-7 when using populations information
  • suppress h5py warning
  • Allow sample names to be numbers as well.


May 03, 2018

  • Better handling of utf-8 in sample names by default.
  • Add docs in the faq about the empty varcounts array
  • Catch an exception in sratools raised by non-existant sra directory.
  • Add HDF5 file locking fix to the faq.
  • Add docs to peddrad notebook.
  • Adding PE-ddRAD analysis notebook.
  • Add the right imports error message to the structure analysis tool.


February 21, 2018

  • some releasenotes fixes
  • Fix filter_min_trim_len not honoring the setting in the params file.


February 13, 2018

  • bug fix to
  • updated tetrad cookbook
  • ipa: structure has max_var_multiple option, and documentation now includes it.
  • update baba cookbook
  • API user guide update
  • bug fix: allow for ‘n’ character in reftrick
  • ipa: can reload structure results, better API design for summarizing results, better documentation
  • allow subsetting in baba plot, and bug fix for generate_tests dynamic func
  • undo dumb commit
  • added –download to the docs example


January 23, 2018

  • Fix step 2 with imported fastq ungzipped.
  • docs update
  • update ipa structure notebook
  • update ipyparallel tutorial
  • update ipa structure notebook
  • docs updates
  • improved cleanup on sra tools
  • updated bucky cookbook
  • updated –help for sra download
  • updated docs for sra download


January 09, 2018

  • fixed gphocs output format
  • A note to add a feature for the future.
  • abba baba cookbook updated for new code
  • updated baba plot to work better with updated toytree
  • baba: added functions for parsing results of 5-taxon tests and improved plotting func.
  • added notes
  • added CLI command to do quick downloads from SRA. Useful for tutorials especially
  • update bpp cookbook
  • added functions to calculate Evanno K and to exlude reps based on convergence stats
  • added funcs to bpp tool to load existing results and to parse results across replicates
  • ipp jobs are submitted as other jobs finish so that RAM doesn’t fill up with queued arrays


November 16, 2017

  • bugfix; error was raised in no barcodes during step2 filtering for gbs data. Now just a warning is printed
  • Fixed structure conda meta.yaml
  • Fix ipcluster warning message.
  • Adding to the faq explaining stats better
  • new working meta.yaml
  • trying alternatives with setup files for jupyter conda bug fix
  • updating stuff to try to fix jupyter missing in conda install


November 13, 2017

  • allow user to set bpp binary path if different from default ‘bpp’
  • skip concat edits of merged reads if merge file exists unless force flag is set
  • added a progress bar tracker for reference indexing
  • speed improvement to refmapping, only tests merge of read pairs if their mapped positions overlap
  • update to docs
  • update API userguide
  • added twiist tool
  • update bpp notebook
  • tetrad bug fix for OSX users for setting thread limit
  • added check for structure path in
  • allow setting binary path and check for binary added to
  • Update requirements.txt
  • Added to the faq how to fix the GLIBC error.
  • Fix logging of superints shape.
  • Test for samples in the populations file not in the assembly.


October 28, 2017

  • Properly handle empty chunks during alignment. Very annoying.


October 28, 2017

  • Fix SE reference bug causing lots of rm_duplicates.
  • Lowered min_se_refmap_overlap and removed useless code to recalibrate it based on filter_min_trim_len.
  • Actually fix conda package.
  • aslkfljsdjsdffd i don’t know how this shit works.
  • Fixing build still.
  • Fix typo in meta.yaml.


October 01, 2017

  • Fix conda build issue.


September 28, 2017

  • Fix orientation of R2 for pe refmap reads.
  • better error reporting, and ensure * at top of stacks
  • quickfix from last commit, keep first st seq after pop to seed in align
  • edge trim in s7 cuts at 4 or minsamp
  • added adapter-barcode order checking for cases where merged samples, and pegbs data is analyzed either as pe or forced into se.
  • update to gbs edge trimming, stricter filtering on partial overlapping seqs
  • Add a comment line to the pysam conda build to make it easier to build on systems with older glibc.
  • “Updating ipyrad/ to version - 0.7.13
  • API style modifications to tetrad


September 05, 2017

  • API style modifications to tetrad


September 04, 2017

  • Add support for optional bwa flags in hackersonly.
  • Force resetting the step 6 checkpointing if step 5 is re-run.
  • fix for max_shared_Hs when a proportion instead of a fixed number. Now the proportion is applied to every locus based on teh number of samples in that locus, not the total N samples
  • access barcode from assembly not sample unless multiple barcodes per sample. Simpler.
  • added back in core throttling in demux step b/c it is IO limited
  • fix to progress bar fsck, and fix to cluster location used in step4 that was breaking if assemblies were merged between 3 and 4
  • step 6 clustering uses threading options from users for really large systems to avoid RAM limits
  • fix for progress bar printing in tetrad, and to args entry when no tree or map file
  • fix to default ncbi sratools path


August 28, 2017

  • update ezrad notebook
  • ezrad-test notebook up
  • Update cookbook-empirical-API-1-pedicularis.ipynb
  • big improvements to sratools ipa, now better fetch function, easier renaming, and wraps utility to reassign ncbi dump locations
  • fix for bucky bug in error reporting
  • wrote tetrad CLI to work with new tetrad object
  • rewrite of tetrad, cleaner code big speed improvements
  • allow more flexible name entry for paired data, i.e., allow _R1.fastq, or _1.fastq instead of only _R1_.fastq, etc.
  • Fixed denovo+reference assembly method.
  • update bpp cookbook
  • update bpp cookbook
  • “Updating ipyrad/ to version - 0.7.11
  • removed repeat printing of error statements
  • added more warning and reports to bpp analysis tool


August 14, 2017

  • removed repeat printing of error statements
  • added more warning and reports to bpp analysis tool


August 14, 2017

  • better error checking in bucky run commandipa tools
  • added workdir default name to sra tools ipa tool
  • improved error checking in step 6
  • bugfix for VCF output where max of 2 alternative alleles were written although there could sometimes be 3


August 08, 2017

  • fix misspelled force option in ipa bucky tool
  • bpp ipa tool changed ‘locifile’ arg to ‘data’ but still support old arg, and removed ‘seed’ arg from run so that the only ‘seed’ arg is in paramsdict
  • bugfix to not remove nex files in bucky ipa tool


August 07, 2017

  • cleaner shutdown of tetrad on interrupt. Bugfix to stats counter for quartets sampled value. Cleaner API access by grouoping attributes into params attr.
  • cleanup rawedit to shutdown cleaner when interrupted
  • modified run wrapper in assembly object to allow for cleaner shutdown of ipyclient engines
  • bug fix so that randomize_order writes separate seqfiles for each rep in bpp analysis tool
  • Adding error handling, prevent tmp files being cleaned up during DEBUG, and fix tmp-align files for PE refmap.
  • Derep and cluster 2brad on both strands.
  • Actually fix refmap PE merging.
  • Fix merging for PE refmap.
  • Add a switch to _not_ delete temp files if DEBUG is on. Helpful.
  • 2 new merge functions for PE refmap. One is slowwwww, the other uses pipes, but doesn’t work 100% yet.
  • New hackersonly parameter to switch merging PE after refmap.
  • bugfix to ipa baba plotting function for updated toyplot
  • Reduce minovlen length for merging reference mapped PE reads.
  • docs update
  • docs update
  • improved design of –ipcluster flag in tetrad CLI
  • improved design of –ipcluster flag in tetrad CLI
  • improved design of –ipcluster flag in ipyrad CLI


July 28, 2017

  • bpp randomize-order argument bugfix
  • added .draw to treemix object
  • update tuts


July 27, 2017

  • Proper support for demux 2brad.


July 27, 2017

  • Fix very nasty refmap SE bug.
  • update tutorials – added APIs
  • update tutorials – added APIs
  • testing MBL slideshow
  • API cookbooks updated
  • cleanup of badnames in sratools


July 26, 2017

  • Added error handling in persistent_popen_align3
  • Catch bad seeds in step 6 sub_build_clustbits().


July 26, 2017

  • Actually fix the step 6 boolean mask error.
  • Fix for boolean mask array length bug in step 6.
  • add -noss option for treemix ipa
  • mods to tetrad and sratools ipa
  • ensure ints not floats for high depth base counts
  • sratools updates
  • improvements to sratools
  • added extra line ending to step7 final print statement
  • add dask to environment.yaml
  • added sratools to ipyrad.analysis


July 23, 2017

  • Better handling for restarting jobs in substeps of step 6.
  • Fixed the fscking pysam conda-build scripts for osx.
  • Add patch for pysam build on osx
  • Fix for conda-build v3 breaking meta.yaml
  • Using htslib internal to pysam and removing bcftools/htslib/samtools direct dependencies.
  • Add force flag to force building clusters if utemp exists.
  • conda recipe updates
  • updateing conda recipe
  • ensure stats are saved as floats
  • fix to bug introduced just now to track progress during s6 clustering
  • Fix an issue with merged assemblies and 3rad.
  • fix for step 6 checkpoints for reference-based analyses
  • conda recipe tweaking
  • conda recipe updates
  • fix to conda recipes
  • update bucky cookbook
  • added shareplot code
  • bucky ipa update remove old files
  • conda recipe updated
  • “Updating ipyrad/ to version - 0.7.2
  • update conda recipe
  • update pysam to correct version


July 10, 2017

  • update conda recipe
  • update pysam to correct version (
  • added bucky ipa code
  • bucky cookbook up
  • automatically merges technical replicates in demux
  • check multiple barcodes in samples that were merge of technical replicates
  • fix for alleles output error
  • Added checkpointing/restarting from interrupt to step 6.
  • Added cli detection for better spacer printing.
  • bpp bug fixe to ensure full path names
  • API user guide docs update.
  • cookcook updates tetrad and treemix
  • new _cli, _checkpoint, and _spacer attributes, and new ‘across’ dir for step 6
  • load sets cli=False by default, and it saves checkpoint info
  • allow profile with ipcluster
  • treemix report if no data is written (i.e., all filtered)
  • fix to allow setting nquartets again.
  • Better integration of API/CLI.
  • Bug fix to Tree drawing when no boots in tetrad.
  • tetrad fix for compatibility with new toytree rooting bug fix for saving features.
  • cli is now an attribute of the Assembly object that is set to True by __main__ at runtime, otherwise 0.
  • cluster_info() now prints instead of return
  • rehaul of bucky ipa tools
  • print cluster_info now skips busy engines
  • unroot tetrad tree on complete


June 16, 2017

  • Actually handle SE reference sequence clustering.
  • Prevent empty clust files from raising an error. Probably only impacts sim data.
  • If debug the retain the bed regions per sample as a file in the refmap directory.
  • updated tunnel docs
  • HPC tunnel update
  • support for parsing supervised structure analyses in ipa
  • HPC tunnel docs update
  • update analysis docs
  • ipa.treemix params
  • more params added to ipa.treemix
  • cookbook update treemix
  • fix to conda rec
  • treemix ipa updates


June 15, 2017

  • put a temporary block on denovo+ref
  • added treemix ipa funcs
  • update conda recipe
  • added notebook for structure with popdata
  • updated tetrad notebook
  • update bpp notebook
  • fix missing newline in alleles
  • ipa structure file clobber fix
  • cleaner and more consistent API attr on ipa objects
  • Added docs for the -t flag.
  • fix in ipa.structure so replicate jobs to do not overwrite
  • Fix bad link in docs.
  • better method to find raxml binary in analysis tools
  • consens bugfix for new ipmlementation
  • ensure h5 files are closed after dask func
  • fix to parse chrom pos info from new consens name format
  • removed deprecated align funcs
  • removed hardcoded path used in testing
  • removed deprecated align funcs. Made it so build_clusters() does nothing for ‘reference’ method since there is a separate method in ref for chunking clusters
  • some new simpler merge funcs
  • make new ref funcs work with dag map
  • new build funcs usign pysam


June 03, 2017

  • Step 6 import fullcomp from util.


June 01, 2017

  • Step 4 - Handle the case where no clusters have sufficient depth for statistical basecalling.


May 30, 2017

  • Fix a bug in refmap that was retaining the reference sequence in the final clust file on rare occasions.


May 25, 2017

  • Bug fix for “numpq” nameerror


May 24, 2017

  • bug fix for numq error in s5


May 22, 2017

  • Fixed bug in vcf output for reference mapped.


May 19, 2017

  • Fix new chrom/pos mechanism to work for all assembly methods.
  • Change chroms dtype to int64. Reference sequence CHROM is now 1-indexed. Anonymous loci are -1 indexed.
  • Switch chroms dataset dtype to int64.
  • Fix for alleles output.
  • Fix nasty PE refmap merging issue.
  • Fix massive bug in how unmapped reads are handled in refmap.
  • added md5 names to derep and simplified code readability within pairmerging
  • fix for binary finder
  • added dask to conda recipe
  • added dask dependency


May 10, 2017

  • added dask dependency
  • vcf building with full ref info
  • bug fix to alleles output and support vcf chrompos storage in uint64
  • simpler and slightly faster consens calls and lower memory and stores chrompos as uint64s
  • chrompos now stored as uint64
  • reducing memory load in race conditions for parallel cutadapt jobs
  • Squash Cosmetic commit logs in releasenotes. Add more informative header in step 7 stats file.
  • Trying to catch bad alignment for PE in step 6.


May 04, 2017

  • Handle empty locus when building alleles file. Solves the ValueError “substring not found” during step 7.
  • workshop notebook uploaded


May 03, 2017

  • update to analysis tools
  • accepted the local bpp notebook
  • complete bpp notebook up
  • notebook updates
  • raxml docs
  • raxml cookbook up
  • docs update
  • raxml docs updated
  • links to miniconda updated
  • fix for tetrad restarting bootstraps
  • removed bitarray dependency
  • adding restart checkpoints in step6


April 26, 2017

  • support for alleles file in bpp tools
  • align names in alleles output
  • bugfix to name padding in .alleles output
  • slight delay between jobs
  • bpp store asyncs
  • bpp store asyncs
  • update bpp cookbook
  • testing html
  • testing html
  • new filter_adapters=3 option adds filtering of poly-repeats
  • conda recipe update for cutadapt w/o need of add-channel


April 25, 2017

  • alleles output now supported
  • Additional documentation for max_alleles_consens parameter.
  • support alleles output, minor bug fixes for step6, much faster alignment step6
  • lower default ‘cov’ value for vsearch within clustering in RAD/ddrad/pairddrad
  • tetrad bug, use same ipyclient for consensus tree building
  • store asyncs in the structure object
  • allow passing in ipyclient explicitly in .run() in tetrad
  • fix for time stamp issue in tetrad
  • Better testing for existence of all R2 files for merged assemblies.
  • notebook updates
  • tunnel docs update
  • updated HPC docs
  • tetrad cookbook updated
  • HPC docs update
  • bpp cookbook good to go
  • update tetrad notebook
  • missing import


April 18, 2017

  • Actually fix gphocs output.
  • allow passing in ipyclient in API
  • baba notebook update
  • cleaner api for bpp object
  • new analysis setup
  • updated analysis tools without ete
  • adding doc string


April 13, 2017

  • Fixed CHROM/POS output for reference mapped loci.


April 13, 2017

  • Fix gphocs output format.
  • If the user removes the population assignment file blank out the data.populations dictionary.


April 10, 2017

  • Prevent versioner from including merge commits in the release notes cuz they are annoying.
  • Add the date of each version to the releasenotes docs, for convenience.
  • Experimenting with adding date to releasenotes.rst
  • added more attributres to tree
  • change alpha to >=
  • tip label and node label attributes added to tree
  • tetrad ensure minrank is int
  • fix structure obj removing old files
  • lots of cleanup to baba code
  • edit to analysis docs
  • Handle pop assignment file w/o the min sample per pop line.
  • merge conflict resolved
  • bug fix for tuples in output formats json
  • sim notebook started
  • cookbook abba-baba updated
  • tetrad cookbook api added
  • added option to change line spacing on progress bar
  • major overhaul to ipyrad.analysis and plotting
  • option to buffer line spacing on cluster report
  • Removed confusing punctuation in warning message
  • Make vcf and loci output files agree about CHROM number per locus.
  • Cosmetic change to debug output.
  • Make the new debug info append instead of overwrite.
  • Fix annoying bug with output_format param I introduced recently.
  • Add platform info to default log output on startup.
  • Actually write the error to the log file on cutadapt failure.
  • Write the version and the args used to the log file for each run. This might be annoying, but it could be useful.
  • bpp randomize option added to write
  • adding bpp cookbook update
  • updating analysis tools for new bpp baba and tree
  • merge resolved
  • analysis init update for new funcs
  • apitest update
  • abba cookbook update
  • update bpp cookbook
  • small edit to HPC docs
  • tetrad formatting changing
  • updated analysis tools cookbooks
  • docs analysis page fix
  • added header to bpp convert script


March 27, 2017

  • Fix a bug in PE refmapping.
  • Fix error reporting if when testing for existence of the clust_database file at beginning of step 7.
  • Fix bug reading output formats from params file.
  • Add docs for dealing with long running jobs due to quality issues.
  • bug fix for output format empty
  • structure cookbook update
  • pushing analysis tools
  • svg struct plot added
  • structure cookbook updates
  • struct image added for docs
  • update structure cookbook for new code
  • Actually fix the output_format default if blank.
  • Set blank output formats in params to default to all formats.
  • Add a filter flag for samtools to push secondary alignments to the unmapped file.
  • rm old files
  • shareplot code in progress
  • work in progress baba code notebook
  • a decent api intro but bland
  • beginnings of a migrate script
  • raxml docs updated, needs work still
  • analysis docs page update
  • structure parallel wrapper scripts up in analysis
  • simplifying analysis imports
  • cleanup top imports
  • Adding support for G-PhoCS output format.
  • Fix wacky reporting of mapped/unmapped reads for PE.
  • Document why we don’t write out the alleles format currently.
  • module init headers
  • added loci2cf script
  • update structure notebook with conda recipes
  • fileconversions updated
  • loci2cf func added
  • cookbook bucky docs up
  • loci2multinex and bucky notebook updated
  • BUCKy cookbook updated
  • bucky conda recipe up
  • fix to API access hint
  • cleaner code by moving msgs to the end
  • slight modification to paired adapter trimming code
  • cleaner Class Object in baba
  • minor change to cluster_info printing in API


  • Filter reference mapped reads my mapq < 30, and handle the occasional malformed region string in bam_region_to_fasta.
  • Handle PE muscle failing alignment.
  • Cosmetic faq.rst
  • Cosmetic faq.rst
  • Cosmetic
  • Cosmetic docs changes.
  • Add docs for step 3 crashing bcz of lack of memory.
  • Catch a bug in alignment that would crop up intermittently.
  • removed the –profile={} tip from the docs
  • Fix notebook requirement at runtime error.
  • Fix formatting of output nexus file.


  • Changed the sign on the new hackersonly parameter min_SE_refmap_overlap.
  • added a persistent_popen function for aligning, needs testing before implementing
  • debugger in demux was printing way too much
  • bugfix for empty lines in branching subsample file
  • Add a janky version checker to nag the user.


  • Actually remove the reference sequence post alignment in step 3. This was BREAKING STUFF.
  • updated notebook requirement in conda recipe
  • Handle conda building pomo on different platforms.
  • Ooops we broke the script. Now it’s fixed.
  • conda recipe updates
  • conda recipe updates
  • conda recipe updates
  • testing git lfs for storing example data


  • Fixed stats reported for filtered_by_depth during step 5.
  • Add new hackersonly parameter min_SE_refmap_overlap and code to to forbid merging SE reads that don’t significantly overlap.
  • Use preprocessing selectors for linux/osx for clumpp.
  • Add url/md5 for mac binary to clumpp meta.yaml
  • conda recipes update
  • getting ipyrad to conda install on other envs
  • updating versions for conda, rtd,
  • moving conda recipes
  • conda recipe dir structure
  • bpp install bug fix
  • bpp recipe fix
  • conda recipes added
  • Roll back change to revcomp reverse strand SE hits. Oops.
  • fix merge conflect with debug messages.
  • Fix a bug in refmap, and handle bad clusters in cluster_within.
  • Actually revcomp SE - strand reads.
  • updated HPC docs
  • updated HPC docs
  • updated HPC docs


  • bug fix in building_arrays where completely filtered array bits would raise index error -1
  • tunnel docs updates
  • method docs updated to say bwa
  • some conda tips added
  • fix for name parsing of non gzip files that was leaving an underscore
  • Allow get_params using the param string as well as param index
  • Update hpc docs to add the sleep command when firing up ipcluster manually.
  • Fixed some formatting issues in the FAQ.rst.


  • Fixed 2 errors in steps 3 and 4.
  • “Updating ipyrad/ to version - 0.6.4
  • left a debugging print statement in the code
  • removed old bin
  • “Updating ipyrad/ to version - 0.6.4


  • left a debugging print statement in the code
  • removed old bin
  • “Updating ipyrad/ to version - 0.6.4



  • update to docs parameters
  • bug fix for merging assemblies with a mix of same named and diff named samples


  • Fixed a bug i introduced to assembly. Autotroll.


  • Fix subtle bug with migration to trim_reads parameter.


  • Fixed malformed nexus output file.
  • cookbook updates to docs
  • updated cookbook structure pedicularis


  • trim reads default 0,0,0,0. Similar action to trim loci, but applied in step 2 to raws
  • trim_reads default is 0,0
  • raise default cov/minsl for gbs data to 0.5 from 0.33
  • prettifying docs
  • pedicularis docs update v6 way way faster
  • updated tutorial
  • fixing links in combining data docs
  • updating tutorial for latest version/speed
  • added docs for combining multiple plates
  • added docs for combining multiple plates
  • added docs for combining multiple plates
  • Removed from output formats defaults (it doesn’t do anything)
  • baba cookbooks [unfinished] up
  • finally added osx QMC and fixed bug for same name and force flag rerun
  • put back in a remove tmpdirs call
  • removed a superfluous print statement
  • bug fix to mapfile, now compatible with tetrad
  • paramsinfo for new trimreads param
  • branching fix for handling new param names and upgrading to them
  • better handling of pairgbs no bcode trimming. Now handles –length arg
  • better handling of KBD in demux. Faster compression.
  • forgot sname var in cutadaptit_single
  • Fix step 2 for PE reads crashing during cutatapt.
  • Test for bz2 files in sorted_fastq_path and nag the user bcz we don’t support this format.
  • Step 1 create tmp file for estimating optim chunk size in project_dir not ./
  • Add force flag to mapreads(), mostly to save time on rerunning if it crashes during finalize_mapping. Also fixed a nasty bug in refmapping.
  • Added text to faq about why PE original RAD is hard to assemble, cuz people always ask.
  • Better handling of loci w/ duplicate seqs per sample.
  • Fix a bug that munged some names in branching.
  • merge conflict
  • modified for new trim param names
  • support for new trim_loci param
  • support for updated cutadapt
  • bugfix for hackerdict modify of cov
  • chrom only for paired data
  • changed two parameter names (trims)
  • tested out MPI checks
  • cutadapt upgrade allow for –length option
  • Moved log file reset from init to main to prevent -r from blanking the log >:{
  • Moved log file reset from __init__ to __main__
  • Don’t bother aligning clusters with duplicates in step 6.
  • baba update
  • remove print statement left in code
  • same fix to names parser, better.
  • added comment ideas for chrompos in refmap
  • bug fix, Sample names were being oversplit if they had ‘.’ in them
  • test labels, improved spacing, collapse_outgroups options added to baba plots
  • Fix debug message in refmap and don’t raise on failure to parse reference sequence.
  • attempts to make better cleanup for interrupt in API
  • some cleanup to calling steps 1,2 funcs
  • speed testing demux code with single vs multicore
  • moved setting of [‘merged’] to replace filepath names to Assembly instead of main so that it also works for the API
  • added a np dict-like arr to be used in baba, maybe in ref.
  • baba plotting functions added
  • Better handling of tmpdir in step 6.
  • added baba cookbook
  • only map chrom pos if in reference mode
  • new batch and plotting functions
  • trim .txt from new branch name if accidentally added to avoid Assembly name error
  • added a name-checker to the branch-drop CLI command
  • Fixed legend on Pedicularis manuscript analysis trees.
  • Cosmetic change
  • Adding manuscript analysis tree plotting for empirical PE ddRAD refmap assemblies.
  • More or less complete manuscript analysis results.
  • Actually fix vcf writing CHROM/POS information from refseq mapped reads.
  • Handle monomorphic loci during vcf construction.
  • removed deprecated subsample option from jointestimate
  • –ipcluster method looks for default profile and cluster-id instance
  • clode cleanup and faster haploid E inference
  • simplified cluster info printing
  • enforce ipyclient.shutdown at end of API run() if engine jobs are not stopped
  • code cleanup. Trying to allow better KBD in step2
  • lots of cleanup to DAG code. Now ok for individual samples to fail in step3, others will continue. Sorts clusters by derep before align chunking
  • Allow assemblies w/o chrom/pos data in the hdf5 to continue using the old style vcf position numbering scheme.
  • Don’t print the error message about samples failing step 4 if no samples actually fail.
  • Set a size= for reference sequence to sort it to the top of the chunk prior to muscle aligning.
  • Allow samples with very few reads to gracefully fail step 4.
  • Better error handling during reference mapping for PE.
  • Fix error reporting in merge_pairs().
  • Add CHROM/POS info to the output vcf file. The sorting order is a little wonky.
  • Handle empty project_dir when running -r.
  • a clean bighorse notebook run on 100 cores
  • Fix minor merge conflict in ref_muscle_chunker.
  • Use one persistant subprocess for finalizing mapped reads. Big speed-up. Also fix a stupid bug in estimating insert size.
  • Better handling of errors in merge_pairs, and more careful cleanup on error.
  • If /dev/shm exists, use it for finalizing mapped reads.
  • Handle a case where one or the other of the PE reads is empty.
  • cleaner print cpus func
  • Adding a new dataset to the catg and clust hdf5 files to store CHROM and POS info for reference mapped reads.
  • added cleanhorse notebook
  • working on notebook
  • cleanup up redundancy
  • MUCH FASTER STEP 4 using numba array building and vectorized scipy
  • MUCH FASTER MUSCLE ALIGNING. And a bug fix to a log reporter
  • bug fix to error/log handler
  • Finish manuscript refmap results analysis. Added a notebook for plotting trees from manuscript Pedicularis assembly.
  • Better checking for special characters in assembly names, and more informative error message.
  • added a test on big data
  • broken notebook
  • development notebook for baba
  • working on shareplots
  • testing caching numba funcs for faster run starts
  • added optional import of subprocess32
  • docs update
  • progress on baba
  • added option to add additional adapters to be filtered from paired data
  • Adding pairwise fst to manuscript analysis results. Begin work on raxml for manuscript analysis results.
  • Change a log message from info to warn that handles exceptions in rawedit.
  • abba baba updated
  • Fixed link in tetrad doc and cosmetic change to API docs.
  • Add comments to results notebooks.
  • Adding manuscript reference mapping results.
  • Manuscript analysis reference sequence mapping horserace updates. Stacks mostly done. dDocent started.
  • Adding ddRAD horserace nb.
  • Better cleanup during refmap merge_pairs (#211).
  • update for raxml-HYBRID
  • update raxml docs
  • cleanup old code
  • update raxml docs
  • updating raxml docs
  • update to bucky cookbook


  • bug fix to ensure chunk size of the tmparray in make-arrays is not greater than the total array size
  • fix for vcf build chunk error ‘all input arrays must have the same number of dimensions’. This was raised if no loci within a chunk passed filtering
  • allow vcf build to die gracefully
  • api cleanup


  • updated docs for popfile
  • fix for long endings on new outfile writing method
  • Made max size of the log file bigger by a zero.
  • Be nice and clean up a bunch of temporary files we’d been leaving around.
  • Better handling for malformed R1/R2 filenames.
  • api notebook update
  • more verbose warning on ipcluster error
  • allow setting ipcluster during Assembly instantiation
  • improved populations parser, and cosmetic
  • greatly reduced memory load with new func boss_make_arrays that builds the arrays into a h5 object on disk, and uses this to build the various output files. Also reduced disk load significantly by fixing the maxsnp variable bug which was making an empty array that was waay to big. Also added support for nexus file format. Still needs partition info to be added.
  • CLI ipcluster cluster-id=’ipyrad-cli-xxx’ to more easily differentiate from API
  • added note on threading
  • API cleanup func names
  • write outfiles h5 mem limit work around for build-arrays
  • step 1 with sorted-fastq-path no longer creates empty fastq dirs


  • API user guide updated
  • Added ipyclient.close() to API run() to prevent ‘too many files open’ error.
  • Bug fix for concatenation error in vcf chunk writer
  • added smarter chunking of clusters to make for faster muscle alignments
  • closed many subprocess handles with close_fds=True
  • added closure for open file handle
  • cleanup of API attributes and hidden funcs with underscores


  • Refmap: actually fix clustering when there are no unmapped reads.
  • Updated docs for parameters.


  • Refmap: Handle case where all reads map to reference sequence (skip unmapped clustering).
  • More refined handling of reference sequences with wacky characters in the chrom name like | and (. Who would do that?
  • Raxml analysis code added to Analysis Tools:
  • HPC tunneling documentation updated with more troubleshooting
  • Better handling of final alignments when they contain merged and unmerged sequences (#207)
  • added finetune option to loci2bpp Analysis tools notebook.
  • More improvements to manuscript analysis.
  • Finished simulated analysis results and plotting.
  • Improve communication if full raw path is wonky.
  • Horserace is complete for simulated and empirical. Continued improvement to gathering results and plotting.


  • Fix for 3Rad w/ only 2 cutters during filtering.
  • Better handling for malformed 3rad barcodes file.



  • improved progress bar
  • merge fix
  • notebook testing geno build
  • Fix to memory handling on vcf build, can now handle thousands of taxa. Also, now saves filepaths to json and API object.
  • progres on dstats package
  • More progress on manuscript horserace. Analysis is done, now mostly working on gathering results.


  • Fix error handing during writing of vcf file.


  • notebook testing
  • purge after each step to avoid memory spillover/buildup
  • better handling of memory limits in vcf build. Now producing geno output files. Better error reporting when building output files
  • added a global dict to util
  • new smaller limit of chunk sizes in h5 to avoid memory limits
  • analysis docs update
  • Document weird non-writable home directory on cluster issues.
  • docs update for filtering differences
  • merge fix
  • tetrad notebook edits
  • dstat calc script editing
  • Added code to copy barcodes during assembly merge. Barcodes are needed for all PE samples in step 2.


  • Better handling for PE with loci that have some merged and some unmerged reads.
  • Allow other output formats to try to build if vcf fails.
  • Fixed bug that was forcing creation of the vcf even if it wasn’t requested.


  • More improved handling for low/no depth samples.
  • Better handling for cleanup of samples with very few reads.


  • Catch sample names that don’t match barcode names when importing demux’d pair data.
  • Serious errors now print to ipyrad_log.txt by default.


  • Handle sample cleanup if the sample has no hidepth clusters.
  • Fix for declone_3rad on merged reads.
  • Better support for 3rad lining presorted fastqs.
  • bucky cookbook updated
  • dstat code updates
  • bucky cookbook uploaded


  • added tetrad docs
  • make tetrad work through API
  • added tetrad notebook


  • Swap out smalt for bwa inside refmapping. Also removes reindexing of reference sequence on -f in step 3.
  • fix for array error that was hitting in Ed’s data, related to 2X count for merged reads. This is now removed.
  • bug fix for 4/4 entries in vcf when -N at variable site.
  • prettier printing of stats file


  • fix for array error that was hitting in Ed’s data, related to 2X count for merged reads. This is now removed.
  • bug fix for 4/4 entries in vcf when -N at variable site.
  • prettier printing in s5 stats file
  • hotfix for large array size bug introduced in 0.4.8


  • bug fix to measure array dims from mindepth settings, uses statistical for s4, and majrule for s5
  • adding bwa binary for mac and linux
  • improved N removal from edges of paired reads with variable lengths
  • new parsing of output formats, and fewer defaults
  • only snps in the vcf is new default. Added pair support but still need to decide on spacer default. New cleaner output-formats stored as a tuple
  • small fix for better error catching
  • new hidepth_min attr to save the mindepth setting at the time when it is used
  • mindepth settings are now checked separately from other parameters before ‘run’ to see if they are incompatible. Avoids race between the two being compared individually in set-params.
  • new functions in steps 3-5 to accomodate changes to mindepth settings so that clusters-hidepth can be dynamically recalculated
  • fix to SSH tunnel docs
  • hotfix for step5 sample save bug. pushed to soon


  • make compatible with changes to s6
  • allow sample to fail s2 without crashing
  • cleaner progress bar and enforced maxlen trimming of longer reads
  • lowered maxlen addon, enforced maxlen trimming in singlecat
  • updates to docs
  • testing new maxlen calculation to better acommodate messy variable len paired data sets.
  • update to docs about pre-filtering
  • temporary fix for mem limit in step 6 until maxlen is more refined
  • Fix bug in refmap.


  • Nicely clean up temp files if refmap merge fails.


  • Add docs for running ipcluster by hand w/ MPI enabled.
  • Fix PE refmap bug #148
  • Documenting PYTHONPATH bug that crops up occasionally.
  • Adjusted fix to bgzip test.
  • Fixed a bug w/ testing for bgzip reference sequence. Also add code to fix how PE ref is handled to address #148.
  • fix for last fix
  • fix for last push gzip
  • collate with io.bufferedwriter is faster
  • faster collating of files
  • Continuing work on sim and empirical analysis.
  • rev on barcode in step2 filter pairgbs
  • faster readcounter for step1 and fullcomp on gbs filter=2 barcode in step2
  • tunnel docs update
  • working on a SSH tunnel doc page
  • Handle OSError in the case that openpty() fails.


  • Handle blank lines at the top of the params file.


  • making smoother progress bar in write vcfs
  • bugfix for jointestimate
  • testing bugfixes to jointestimate
  • default to no subsampling in jointestimate call
  • testing bugfixes to jointestimate
  • added hackersonly option for additional adapters to be filtered
  • bug fix to joint H,E estimate for large data sets introduced in v.0.3.14 that was yielding inflated rates.
  • fix for core count when using API
  • Added plots of snp depth across loci, as well as loci counts per sample to results notebook.
  • phylogenetic_invariants notebook up
  • some notes on output formats plans
  • removed leftjust arg b/c unnecessary and doesn’t work well with left trimmed data


  • Merging for Samples at any state, with warning for higher level states. Prettier printing for API. Fix to default cores setting on API.
  • fix for merged Assemblies/Samples for s2
  • fix for merged Assemblies&Samples in s3
  • removed limit on number of engines used during indexing
  • Added ddocent to manuscript analysis.
  • tutorial update
  • in progress doc notebook
  • parallel waits for all engines when engines are designated, up until timeout limit
  • parallelized loading demux files, added threads to _ipcluster dict, removed print statement from save
  • vcf header was missing
  • added step number to progress bar when in interactive mode
  • added warning message when filter=2 and no barcodes are present
  • improved kill switch in step 1
  • use select to improve cluster progress bar
  • added a CLI option to fine-tune threading
  • added dstat storage by default
  • new default trim_overhang setting and function (0,0,0,0)
  • fix for overzealous warning message on demultiplexing when allowing differences


  • Fixed reference before assignment error in step 2.


  • Cosmetic change
  • new sim data and notebook up
  • Added aftrRAD to the manuscript analysis horserace
  • made merging reads compatible with gzipped files from step2
  • modify help message
  • made TESTS global var, made maparr bug fix to work with no map info
  • More carefully save state after completion of each step.
  • limit vsearch merging to 2 threads to improve parallel, but should eventually make match to cluster threading. Added removal of temp ungzipped files.
  • more detailed Sample stats_df.s2 categories for paired data
  • made merge command compatible with gzip outputs from step2
  • simplified cutadapt code calls
  • updates to simdata notebook
  • merge conflict fix
  • new stats categories for step2 results
  • added adapter seqs to hackersdict
  • much faster vcf building
  • new step2 quality checks using cutadapt
  • small changes to use stats from new s2 rewrite. Breaks backwards compatibility with older assemblies at step3
  • massive rewrite of cluster across, faster indexing, way less memory overhead
  • just added a pylint comment
  • Adding cutadapt requirement for conda build
  • Suppress numpy mean of empty slice warnings.
  • Merged PR from StuntsPT. Fix to allow param restriction_overhang with only one enzyme to drop the trailing comma (,).
  • Merge branch ‘StuntsPT-master’
  • Adding a FAQ to the docs, including some basic ipyparallel connection debugging steps.
  • Adding documentation for the CLI flag for attaching to already running cluster.
  • Update docs to include more specifics about ambiguous bases in restriction overhang seqs.
  • Get max of max_fragment_length for all assemblies during merge()
  • Make gbs a special case for handling the restriction overhang.
  • Changed the way single value tuples are handled.
  • cleaning up releasenotes
  • added networkx to meta.yaml build requirements
  • “Updating ipyrad/ to version - 0.3.42


  • always prints cluster information when not using ipcluster[profile] = default
  • broke and then fixed samtools sorting on mac (BAM->bam)
  • better error message at command line
  • cleaned code base, deleting deprecated funcs.
  • revcomp function bug fix to preserve lower case pair splitter nnnn for pairgbs data
  • Adding requirement for numba >= 0.28 to support
  • Updating mac and linux vsearch to 2.0
  • docs updates (pull request #186) from StuntsPT/master
  • Added a troubleshooting note.
  • wrapped long running proc jobs so they can be killed easily when engines are interrupted
  • fix for API closing ipyclient view
  • fix for piping in subprocess
  • bug fix for missing subprocess module for zcat, and new simplified sps calls.
  • merge fix
  • allow for fuzzy match characters in barcode path
  • new simulated data set
  • uploaded cookbook for simulating data
  • no longer register ipcluster to die at exit, but rather call shutdown explicitly for CLI in the finally call of run()
  • massive code cleanup in refmapping, though mostly cosmetic. Simplified file paths and calls to subprocess.
  • massive restructuring to organize engine jobs in a directed acyclic graph to designate dependencies to ipyparallel. Lot’s of code cleanup for subprocess calls.
  • fix for progress bar cutting short in step 6. And simplified some code calling tmpdir.
  • Adding notebooks for ipyrad/pyrad/stacks simulated/emprical horserace.
  • Better handling for mindepth_statistical/majrule. Enforce statistical >= majrule.
  • Allow users with SE data to only enter a single value for edit_cutsites.
  • Properly finalize building database progress bar during step 6, even if some samples fail.
  • allow max_indels option for step 3 in API. Experimental.
  • bug fix to indel filter counter. Now applies in step7 after ignoring terminal indels, only applies to internal indels
  • much faster indexing using sorted arrays of matches from usort. Faster and more efficient build clusters func.
  • rewrote build_clusters func to be much faster and avoid memory limits. Other code cleanup. Allow max_indel_within option, though only in API currently.
  • numba update requirement to v.0.28


  • Reverting a change that broke cluster_within


  • Set vsearch to ignore max phred q score on merging pairs
  • Added bitarray dependency to conda build


  • Fix vsearch fastq max threshold arbitrarily high. Also remove debug crust.


  • Handle samples with few reads, esp the case where there are no matches during clustering.
  • Handle samples with few or no high depth reads. Just ignore them and inform the user.


  • Fix to allow pipe character in chrom names of reference sequences
  • Tweak to calculation of inner mate distance (round up and cast to int)
  • Refmap: fix calc inner mate distance PE, handle samples w/ inner mate distance > max, and handle special characters in ref seq chromosome names
  • Add a test to forbid spaces in project directory paths
  • Cosmetic docs fix
  • Cosmetic fix to advanced CLI docs
  • Added more explicit documentation about using the file to select samples during branching
  • Clarifying docs for qscore offset in the default params file
  • Cosmetic change to docs
  • Rolling back changes to build_clusters
  • “Updating ipyrad/ to version - 0.3.36
  • hotfix for edgar fix break
  • “Updating ipyrad/ to version - 0.3.36
  • hotfix for edgar fix break


  • hotfix for edgar fix break
  • “Updating ipyrad/ to version - 0.3.36
  • hotfix for edgar fix break


  • hotfix for edgar fix break


  • hotfix for memory error in build_clusters, need to improve efficiency for super large numbers of hits
  • more speed testing on tetrad
  • merge conflict
  • cleaner print stats for tetrad
  • finer tuning of parallelization tetrad


  • Handled bug with samtools and gzip formatted reference sequence
  • Fixed a bug where CLI was not honoring -c flag
  • debugging and speed tests
  • added manuscript dir
  • Update on Overleaf.
  • Manuscript project created
  • speed improvements to tetrad
  • smarter/faster indexing in tetrad matrix filling and speed up from skipping over invariant sites
  • finer tuning of bootstrap restart from checkpoint tetrad
  • print bigger trees for tetrad
  • fix to printing checkpoint info for tetrad
  • bug fix for limiting n cores in tetrad
  • made an extended majority rule consensus method for tetrad to avoid big import packages just for this.
  • testing timeout parallel
  • test notebook update
  • adding consensus mj50 function


  • new –ipcluster arg allows using a running ipcluster instance that has profile=ipyrad
  • temporary explicit printing during ipcluster launch for debugging
  • also make longer timeout in _ipcluster dict of Assembly object


  • temporary explicit printing during ipcluster launch for debugging
  • “Updating ipyrad/ to version - 0.3.33
  • also make longer timeout in _ipcluster dict of Assembly object


  • also make longer timeout in _ipcluster dict of Assembly object


  • increased timeout for ipcluster instance from 30 seconds to 90 seconds
  • Added sample populations file format example
  • quick api example up
  • merge conflict
  • removed chunksize=5000 option
  • Update README.rst


  • Fix optim chunk size bug in step 6 (very large datasets overflow hdf5 max chunksize 4GB limit)
  • Doc update: Cleaned up the lists of parameters used during each step to reflect current reality.
  • Fixed merge conflict in
  • Fix behavior in step 7 if requested samples and samples actually ready differ
  • Removing references to deprecated params (excludes/outgroups)
  • Simple error handling in the event no loci pass filtering
  • changed tetrad default mode to MPI
  • release notes update


  • changed name of svd4tet to tetrad
  • improved message gives info on node connections for MPI
  • added a test script for continuous integration
  • big cleanup to ipcluster (parallel) setup, better for API/CLI both
  • modified tetrad ipcluster init to work the same as ipyrad’s
  • generalized ipcluster setup


  • Changed behavior of step 7 to allow writing output for all samples that are ready. Allows the user to choose whether to continue or quit.
  • Fixed very stupid error that was not accurately tracking max_fragment_length.
  • Better error handling on malformed params file. Allows blank lines in params (prevents that gotcha).
  • Cosmetic changes to step 7 interaction if samples are missing from db
  • prettier splash
  • edited splash length, added newclient arg to run
  • testing MPI on HPC multiple nodes
  • updating docs parameters


  • Temp debug code in jointestimate for tracking a bug
  • Step 5 - Fixed info message for printing sample names not in proper state. Cosmetic but confusing.


  • Added statically linked binaries for all linux progs. Updated version for bedtools and samtools. Updated vsearch but did not change symlink (ipyrad will still use 1.10)
  • Bugfix that threw a divide by zero error if no samples were actually ready for step 5


  • Fixed a race condition where sometimes last_sample gets cleaned up before the current sample finishes, caused a KeyError. Very intermittent and annoying, but should work now


  • fix merge conflict
  • removed future changes to demultiplex, fixed 1M array size error
  • added notes todo
  • removed unnecessary imports
  • removed backticks from printouts
  • removed backticks from printouts
  • removed unnecessary ‘’ from list of args
  • code cleanup for svd4tet
  • update to some error messages in svd4tet
  • slight modification to -n printout
  • updated analysis docs
  • minor docs edits
  • updated releasenotes


  • better error message if sample names in barcodes file have spaces in them
  • VCF now writes chr (‘chromosomes’ or ‘RAD loci’) as ints, since vcftools and other software hate strings apparently
  • fix for concatenating multiple fastq files in step2
  • fix for cluster stats output bug


  • added nbconvert as a run dependency for the conda build


  • svd4tet load func improved
  • fixed bug with floating point numbers on weights. More speed improvements with fancy matrix tricks.
  • added force support to svd4tet
  • update releasenotes
  • added stats storage to svd4tet
  • loci bootstrap sampling implemented in svd4tet
  • init_seqarray rearrangement for speed improvement to svd4tet
  • removed svd and dstat storage attributes from Assembly Class
  • added a plink map output file format for snps locations
  • further minimized depth storage in JSON file. Only saved here for a quick summary plot. Full info is in the catg file if needed. Reduces bloat of JSON.
  • huge rewrite of svd4tet with Quartet Class Object. Much more concise code
  • big rearrangement to svd4tet CLI commands
  • code cleanup


  • only store cluster depth histogram info for bins with data. Removes hugely unnecessary bloat to the JSON file.
  • fixed open closure
  • massive speed improvement to svd4tet funcs with numba jit compiled C code
  • added cores arg to svd4tet


  • new defaults - lower maxSNPs and higher max_shared_Hs
  • massive reworking with numba code for filtering. About 100X speed up.
  • reworking numba code in svd4tet for speed
  • added debugger to svd4tet
  • numba compiling some funcs, and view superseqs as ints instead of strings gives big speedups
  • fix to statcounter in demultiplex stats
  • improvement to demultiplexing speed
  • releasenotes update
  • minor fix to advanced tutorial
  • updated advanced tutorial
  • forgot to rm tpdir when done
  • testing s6


  • bug fix for max_fragment_len errors for paired data and gbs
  • fix for gbs data variable cluster sizes.
  • prettier printing, does not explicitly say ‘saving’, but it’s still doing it.
  • numba update added to conda requirements
  • Wrote some numba compiled funcs for speed in step6
  • New numba compiled svd func can speed up svd4tet
  • update to analysis tools docs


  • fix for bug in edge trimming when assembly is branched after s6 clustering, but before s7 filtering


  • Better error handling for alignment step, and now use only the consensus files for the samples being processed (instead of glob’ing every consens.gz in the working directory
  • Fix a bug that catches when you don’t pass in the -p flag for branching
  • cleaning up the releasenotes


  • removed the -i flag from the command line.
  • fix for branching when no filename is provided.
  • Fix so that step 6 cleans up as jobs finish. This fixes an error raised if a dummy job finishes too quick.
  • removed a redundant call to open the allhaps file
  • Added a check to ensure R2 files _actually exist. Error out if not. Updated internal doc for link_fastq().
  • tmp fix for svd4tet test function so we can put up this hotfix


  • working on speed improvements for svd4tet. Assembly using purging cleanup when running API.
  • fix for KeyError caused by cleanup finishing before singlecats in step6
  • update to empirical tutorial


  • write nexus format compatible with ape in svd4tet outputs.
  • closing pipe was causing a stall in step6.


  • merge conflict fix
  • set subsample to 2000 high depth clusters. Much faster, minimal decrease in accuracy. Slightly faster code in s4.
  • better memory handling. Parallelized better. Starts non-parallel cleanups while singlecats are running = things go faster.
  • cluster was commented out in s6 for speed testing


  • Replaced direct call to with ipyrad.bins.vsearch
  • Fixed reference to old style assembly method reference_sub
  • Added ability to optionally pass in a flat file listing subsample names in a column.
  • Set a conditional to make sure params file is passed in if doing -b, -r, or -s
  • Softened the warning about overlapping barcodes, and added a bit more explanation
  • Set default max barcode mismatch to 0


  • Fixed infinite while loop inside __name_from_file


  • Fixed commented call to cluster(), step 6 is working again
  • Added a check to ensure barcodes contain only IUPAC characters
  • Fixed demultiplex sorting progress bar
  • append to the tmp-chunks directory to prevent users from running multiple step1 and stepping on themselves
  • Update README.rst
  • Added force flag for merging CLI
  • Bug in rawedit for merged assemblies
  • much faster indel entry in step6
  • chunks size optimization
  • optimizing chunk size step6
  • merge for lowmem fixes to step6
  • decided against right anchoring method from rad muscle alignments. Improved step6 muscle align progress bar
  • reducing memory load in step6
  • debug merge fix
  • improvement to debug flag. Much improved memory handling for demultiplexing


  • versioner now actually commits the releasenotes.rst


  • Versioner now updates the docs/releasenotes.rst
  • Eased back on the language in the performance expectations note
  • fixed all links to output formats file
  • blank page for recording different performance expectations


  • Added -m flag to allow merging assemblies in the CLI


  • Fix to SNP masking in the h5 data base so that stats counts match the number of snps in the output files.


  • Still in development


  • Still in development.
  • Step7 stats are now building. Extra output files are not.
  • New better launcher for Clients in ipyparallel 5


  • conda installation mostly working from ipyrad channel