Getting Started: Demultiplexing
Demultiplexing is the process of sorting sequenced reads into separate files for each sample in a sequenced run. You may have received your data already demultiplexed, with a separate file for each sample. If so, then you can proceed to the next section. If your data are not yet sorted into separate files then you will need to perform demultiplexing during step 1 of the ipyrad assembly.
Multiplexing and Multiple Libraries
If your data are not yet sorted among individuals/samples then you will need to have barcode/index information organized into a barcodes file to sort data to separate files for each sample. ipyrad has several options for demultiplexing by internal barcodes or external i7 indices, and for combining samples from many different sequencing runs together into a single analysis, or splitting them into separate analyses, as well as for merging data from multiple sequenced lanes into the same sample names (e.g., technical replicates). See the Demultiplexing section for simple examples, and the Cookbook section for further detailed examples.
If demultiplexing then Sample names will be extracted from
the barcodes files, whereas if your data are already demultiplexed then Sample names are extracted from the file names directly. Do not include spaces in file names. For paired-end data we need to be able to identify which R1 and R2 files go together, and so we require that every read1 file name contains the string _R1_
(with underscores before and after), and every R2 file name must match exactly the R1 file except that it has _R2_
in place of _R1_
. See the tutorial data files for an example.
Note
Pay careful attention to file names at the very beginning of an analysis since these names, and any included typos, will be perpetuated through all the resulting data files. Do not include spaces in file names.
Sample Names
When demultiplexing Sample names will be extracted from
the barcodes files whereas if your data are already demultiplexed then Sample names are extracted from file names
directly. Do not include spaces in file names. For paired-end data we need to be able to identify which R1 and R2 files go together, and so we require that every read1 file name contains the string _R1_
(with underscores before and after), and every R2 file name must match exactly the R1 file except that it has _R2_
in place of _R1_
.
See the example data for an example.
Note
Pay careful attention to file names at the very beginning of an analysis since these names, and any included typos, will be perpetuated through all the resulting data files. Do not include spaces in file names.
Barcodes file
The barcodes file is a simple table linking barcodes to samples. Barcodes can be of varying lengths. Each line should have one name and then one barcode, separated by whitespace (a tab or spaces).
sample1 ACAGG
sample2 ATTCA
sample3 CGGCATA
sample4 AAGAACA
Combinatorial indexing
To perform combinatorial indexing you will need to enter two barcodes for each sample name. These should be ordered so that the barcode on read1 is first and the barcode on read2 second. A simple way to ensure that barcodes are attached to your reads in the way that you expect is to look at the raw data files (e.g., use the command line tool less) and check for the barcode sequences.
sample1 ACAGG TTCCG
sample2 ATTCA CCGGAA
sample3 CGGCAT GAGTCC
sample4 AAGAAC CACCG
i7 indexing
External barcodes/indexes can also be attached external to the sequenced read on the Illumina adapters. This is often used to combine multiple plates together onto a single sequencing run. You can find the i7 index in the header line of each read in a fastq file. ipyrad can demultiplex using i7 indices if you turn on a special flag. An example of how to do this using the ipyrad API is available in the cookbook section.
lib1 CCGGAA
lib2 AATTCC
Combining multiple libraries
With ipyrad it is very easy to combine multiple sequenced libraries into a single assembly. This is accomplished by demultiplexing each lane of data separately and then combining the libraries using merging. See the merging section for details and examples in the cookbook section.