# Ecdf

ECDF (Empirical Cumulative Distribution Function) shows the distribution of expression values or fold-changes for a restricted set of genes. This distribution is plotted as an ECDF. You supply a list of cluster IDs, and ExpressionPlot gets the normalized expression levels (see stat option below) for each sample in the project, or fold-changes for each comparison in the project, for all the genes referred to by those IDs. One way to generate the IDs is to grab them from the output of a table browser, by downloading the table then selecting a column in your spreadsheet software (like OpenOffice Calc, or, as a last resort, Microsoft Excel).

The options for ecdf are as follows:

ecdf Options
Set The project you would like to analyze.
Limit Default is to plot all samples (or comparisons). If you have a lot the plot can get messy, so you can specify a limit here

by inputting some text, and only samples (or comparisons) whose names contain that text will be plotted. For example, your samples might be have wildtype and mutant at different time points, with names like WT_1h, MT_3h, etc. To only plot the distributions for the wildtypes you can supply the text "WT" for the limit, and to only plot the 3 hour time point you could supply the text "3h" for the limit.

The limit is actually interpreted as a (Perl) regular expression, so you can use the full power of that engine to specify sample (or comparison) subsets.

Cluster IDs Specify a space-separated list of cluster IDs using "Paste", or upload a list from a file using "Upload".

See the Limit by ID parameter from 2way for more information.

stat Two statistics are available for plotting
• nlev: normalized level. Each gene's expression is converted into a Z-score by subtracting the gene's mean expression across all samples and dividing by its standard deviation:
$\text{nlev}=\frac{x_{g,s} - \overline{x_{g,\cdot}}}{\text{Stdev}(x_{g,\cdot})}$

(where $x_{g,s}$ indicates the expression level (RPKM or microarray intensity) of gene $g$ in sample $s$, $\overline{x_{g,\cdot}}$ indicates the mean expression of gene $g$ over all samples, and $\text{Stdev}(x_{g,\cdot})$ indicates the standard deviation that gene's expression over all samples). Thus the genes' relative expression in the different samples are shown, rather than their raw expressions.
• lfc: log-fold-change. Each gene's log fold change (base 2) is calculated for each comparison. The distribution of log-fold-changes across all genes is plotted for each comparison.
y=0.5 Add the line $y=0.5$ to the plot. This line intersects each distribution at its median.
x=0 Add the line $x=0$ to the plot. For log-fold-changes, this line indicates "unchanged". For normalized levels,

it indicates the mean level of the gene.

width Width of the image in pixels
height Height of the image in pixels