VirtualBox Installation

From ExpressionPlot
Jump to: navigation, search

Install VirtualBox

If your machine doesn't already have VirtualBox on it, you must download and install it. I run Ubuntu linux 10.04 and did this with the following command:

 sudo apt-get install virtualbox

The ExpressionPlot VM has been tested on VirtualBox 3.2.8 and may not work on some earlier versions. You can get the latest version of VirtualBox for your OS on the VirtualBox website.

Download ExpressionPlot Virtual Hard Disks

Next download the latest ExpressionPlot virtual hard disk (810MB) by clicking on the link, or using wget or curl.

Create your new Virtual Machine

Now you have to set up your new virtual machine. The main decision to make here is how much memory to allocate. Only the alignment step (bowtie) requires large amount of memory. Therefore, if you just want to run the pre-computed data sets, or to use pre-aligned data (from BAM files, for example from tophat output). you should be fine with only 2 GB or even less (I have gone down as low as 1792 MB with no problems), On the other hand, if you want to run new datasets completely on the ExpressionPlot pipeline, you will need minimally 3 GB of memory, preferably 4 GB, because the bowtie indexes alone are almost 3GB. If you are going to work with a smaller bowtie index or if you aren't going to do alignments then you can get away with less. In any case, remember that you can't allocate all your memory to the guest machine; you probably have to save at least 500 MB for your host machine.

To create the virtual machine follow these steps

  1. Open VirtualBox. Under Ubuntu it can be found in Applications->System Tools->Oracle VM VirtualBox, or just running VirtualBox from the command line.
  2. Click New to open the Create New Virtual Machine wizard
    • Name your virtual machine, then choose Linux as the OS and Ubuntu as the Version.
    • Set your base memory size. At least 1.5 GB to try out ExpressionPlot, at least 4 GB if you want to run the pipeline with new alignments.
    • On the next screen you'll choose your virtual boot hard disk. Make sure "Boot Hard Disk" is checked, then choose Use Existing Hard Disk. Choose .vdi file you downloaded. If it's not in the drop-down menu, click the little folder icon to the right of the drop-down menu to open the Virtual Media Manager. A new window will pop up. Click the "Add" icon within that window, then navigate to the location of the Expression.vdi file. Select the file and return to the wizard.
    • Click "Next" then, on the final window of the wizard, click "Finish".

Alternatively, you could use the command line to create a headless virtual machine.

Configure Host Forwarding

By default, VirtualBoxes cannot be accessed by ssh or web from the outside network, even from the host machine. If you want, you can tell VirtualBox that you'd like to make your VM accessible on specific ports of your host machine. For example, you could set up port 2222 of the host machine for ssh into the guest (virtual machine), and 8080 to forward web requests to the guest. My machine is called, so once set up in this manner, I can ssh into the virtualbox with ssh -p 2222, and I can point my browser to These two commands will set up port forwarding for ssh and web:

VBoxManage modifyvm $ExpressionPlot --natpf1 "SSH,TCP,,2222,,22"
VBoxManage modifyvm $ExpressionPlot --natpf1 "Web,TCP,,8080,,80"

Here $ExpressionPlot refers to the name of the virtual machine. These commands should be run from the command line of the host machine. You can run them whenever you like, but they won't go into effect until the next time your guest is started.

To check the status of the port forwarding run

 VBoxManage showvminfo $ExpressionPlot

and look for lines beginning with "NIC 1 Rule". Finally, you can undo these settings as follows:

VBoxManage modifyvm $ExpressionPlot --natpf1 delete SSH
VBoxManage modifyvm $ExpressionPlot --natpf1 delete Web
  • If your virtual machine boots and you can log in but networking doesn't seem to work in or out (even after setting up host forwarding), you may have a mis-named virtual network interface, but this can be easily fixed.

Start up the machine: GUI

Login using

 username: expressionplot
 password: highthroughput

This is also the username and password for the mysql database and for the website.

After the login process is complete you can open the ExpressionPlot website by clicking the link on the desktop.

Since the VirtualBox is running Ubuntu server (not Ubuntu desktop) it does not have X windows installed, so you will only get a text terminal.

Start up the machine: headless (recommended)

"Headless" means that the machine won't have a GUI interface. This is an advantage if your host machine is a remote server, and you want to be able to disconnect but leave your guest box running. You will also need to have set up some sort of port forwarding to use this mode (I recommend both ssh and web forwarding). To start your headless machine simply run the following on a host command line:

VBoxHeadless -s ExpressionPlot &

It can be managed with VBoxManage. There are many useful subcommands, but you might find the following two particularly helpful. To list all currently running virtual machines, use

VBoxManage list runningvms

and to shut off a virtual machine, use

VBoxManage controlvm ExpressionPlot poweroff

More information can be found by running VBoxManage without any argument or consulting the VirtualBox Users' Manual.


After you complete the initial installation you will have the base ExpressionPlot system but won't be able to do any analysis until you add in some data. This is done easily with the script to do this (located in the util/ subdirectory of the ExpressionPlot home).

If you are only running the front-end (you have your own back-end), then strictly speaking you could get away without adding anything on, but you would of course have to populate the database by other methods. Even so, it might be useful to download at least an annotation for the genome that you are using so that the SeqView Tool can show known transcripts along with your data.

One way to just try out ExpressionPlot is to get it running on some human tissue panel data. Here is a sequence of add-ons that should populate your database with some the processed data (the download for hg18 annotation is about 70MB and for the tissue panel is about 600MB so it may take a little while):

# Go to ExpressionPlot util directory
cd `expressionplot-config`/util

# Get hg18 annotation files
./ get_annot hg18

# Get processed Human Tissue Panel data
./ get_project Human_Tissue_Panel_processed

If you want to try out the back-end, then you could download the raw sequencing data instead of the processed data. This download is bigger (1.6GB), and you still will need the annotation files. These commands will download the annotation and the raw data, then start the pipeline:

# Go to ExpressionPlot util directory
cd `expressionplot-config`/util

# Get hg18 annotation files
./ get_annot hg18

# Get Human Tissue Panel sequences
./ get_project Human_Tissue_Panel

# Start up screen (optional)
screen -S EP-pipeline-on-Tissue-Panel

# Start pipeline
cd $EP/projects/Human_Tissue_Panel
$EP/RNASeq/ lanes.txt hg18 -j hg18_all_junctions -hjl 31 \
  -cl $EP/annot/hg18/hg18_trimmed_gene_clusters.tsv \
  -ae $EP/annot/hg18/hg18_acembly_AE_events_with_flanking_SS.tsv \
  -p iDEA -l pipeline.%d%b%y.log \
  -ri $EP/annot/hg18/hg18_acembly_intron_events.xls \
  -ate $EP/annot/hg18/hg18_ensGene_term_exons.tsv \
  -ensT $EP/annot/hg18/hg18_ensembl_and_tRNA_clusters.tsv \
  -admin expressionplot -l pipeline.%d%b%y.log \

Then point your web browser to http://SERVERNAME/cgi-bin/expressionplot/ and you are ready to go!

See Installing add-ons for more details.