Pairdist

From ExpressionPlot
Jump to: navigation, search
Example pairdist plot. In addition to this plot pairdist generates a barplot summarizing the non-canonical alignment types.

For paired-end data, this tool generates two plots. The first plot summarizes the "non-canonical" pairs—those reads for which the two pairs either

  1. align to different chromosomes (chr_ne)
  2. align to the same chromosome but on the same strand (str_eq)
  3. align to the same chromosome on different strands, but with the minus strand read upstream of the plus strand read (pos2_lt_pos1), or
  4. align to the same chromosome on different strands, with the minus strand read downstream of the plus strand read and the two reads flanking a known intron (flank_intron).

The percentage of reads of each non-canonical type is shown as a barplot in the first plot.

The remainder of paired-end reads are considered "canonical": the two ends map to different strands the same chromosome with the minus strand read downstream of the plus strand read, and the two ends do not flank any known introns. In these cases the insert size (defined here as the length of unsequenced portion between the reads, which can be negative if the reads actually overlap) can be calculated. The second plot then shows ECDFs of insert sizes for each sample.

Pairdist Options
Set The project for which you want to examine paired-end distributions. Only paired-end RNA-Seq projects will appear in this list.
pd-min The minimum insert size that you want to show in the ECDF (second plot). Some insert sizes may be negative, indicating that

the two reads overlap each other. For example, if you have 36 base paired-end reads, an insert size of -36 indicates that the two ends sequence the same positions, as reverse complements. Leave blank to not have any lower limit.

pd-max The maximum insert size that you want to show in the ECDF (second plot). Although leaving this field blank will show all insert sizes, you will usually want to impose some sort of maximum, since a small fraction of reads will have very large apparent insert sizes. These are probably due to either unannotated introns or artifacts, and they will dwarf the part of the distribution that is actually of interest.
width Width of the image in pixels
height Height of the image in pixels