Difference between revisions of "Read Types"
(→Canonical Distribution Plots) 

Line 33:  Line 33:  
 <code>Normalize</code>   <code>Normalize</code>  
 {{:read_typesNormalize}}   {{:read_typesNormalize}}  
+    
+   <code>ymax</code>  
+   {{:read_typesymax}}  
    
 <code>width</code>   <code>width</code> 
Latest revision as of 10:08, 21 June 2011
Generate bar plots showing the numbers or percentages of reads with different types of alignments. "Types" of alignments can include "none", "multiple", "single", "paired", "exonic", "intronic", etc.
Or, generate a "canonical distribution" plot, showing the positional distribution of read densities flanking 11 types of genomic "landmarks", based on the transcripts in the "knownCanonical" UCSC table:
 intergenic: one landmark created halfway between every pair of adjacent, nonoverlapping genes
 tss: transcription start sites
 start: start codon
 fe5ss: first exon 5' splice site
 intronic: one landmark created in the middle of each intron
 ie3ss: internal exon 3' splice site
 ie5ss: internal exon 5' splice site
 splice: special landmark, just quantifies read density over all the splice junctions of the knownCanonical transcripts.
 le3ss: last exon 3' splice site
 stop: stop codon
 pacs: polyadenylation/cleavage site
The options for read_types
are as follows:
read_types Options  

Set

The project you would like to analyze. 
Read Class

Choosing "All" for read class tabulates the fate of all reads: nonmatching, multiply matching, singleend unique matching or pairedend unique matching.
Choosing "matching" shows the genomic features hit by just the aligning reads: exons, introns, both, splice junctions or intergenic. Choosing "positional" generates a "canonical distribution" plot, showing the positional distribution of reads flanking canonical genomic landmarks. 
Normalize

Whether the bars in the bar plots should be normalized to 100% ("Yes") or shown as numbers of reads ("No") (not an option when readclass=positional ).

ymax

Maximum RPKM value to show (only for readclass=positional ). Leave blank to show full range of RPKM.

width

Width of the image in pixels 
height

Height of the image in pixels 
Canonical Distribution Plots
To calculate the canonical distributions, each chromosomal position is mapped to the closest landmark. All positions further than 200 bases (or whatever value of radiuscanondist
is supplied to RNASeqpipeline.pl
) from the landmark are counted at 201 bases from the landmark, which may lead to apparent dropping off or skyrocketing at the extremal positions (see for example the right extreme of the pacs
distributions in the figure, in red). For each distance <math>d</math> from 201 to 201 and for each landmark type <math>t</math>, the number of positions <math>NC(t,d)</math> in the genome at that distance from the nearest landmark is counted (the sign indicates whether the positions is downstream or upstream on the same strand as the landmark, or on the plus strand for intergenic landmarks). Then, the number of reads <math>NR(t,d)</math> overlapping at each distance from each type of landmark is counted. This is done in a way that counts each base of each alignment to the appropriate distance from its nearest landmark; usually the entire read will be nearest to a single landmark but it is possible for the beginning of the read to have a different nearest landmark than the middle or end. Finally the number of splice junctions in canonical transcripts is counted and the number of reads aligning to those splice junctions is counted. The RPKM for a particular landmark type at a particular position is calculated by normalizing the total number of reads whose alignment overlaps that distance from their nearest landmark of that type by the total number of chromosomal positions at that distance from that type of landmark and by the total number of reads <math>N</math> for that lane:
t &= \mbox{landmark type} \\ d &= \mbox{signed distance to nearest landmark} \\ NC(t,d) &= \mbox{number of chromosomal positions at distance } d \mbox{ to nearest landmark of type } t\\ NR(t,d) &= \mbox{number of read alignment positions at distance } d \mbox{ to nearest landmark of type } t\\ N &= \mbox{total number of aligning reads for lane} \\ RPKM(t,d) &= \frac{NR(t,d) \times 10^9}{NC(t,d) \times N} \end{align}
</math>for <math>t \ne \mbox{splice}</math>. For splice junction reads there is no positional parameter since distance from a splice junction is already captured by distance from a splice site, so, letting <math>NC(\mbox{splice})</math> denote the total number of splice junctions in canonical transcripts and <math>NR(\mbox{splice})</math> denote the total number of read alignments overlapping those junctions,