Last data update: 2014.03.03

R: Example Datasets used in the manual pages as well as in...
ExampleDataR Documentation

Example Datasets used in the manual pages as well as in vignette

Description

Example datasets used in manual pages and vignette by carrying out HTSeq procedure for exonic mapped reads (Observed) and non-exonic mapped reads (Background) and gene length information (genelength).

Usage

   data(ExampleData)

Format

ExampleData contains three data.frames. Two of them are expression matrix. One is called 'Observed'. One is called 'Background'. For the two data.frames, rows represent exonic or non-exonic region mapped reads for each gene. Columns represent each sample. Both the two data.frames have total of 22609 number of rows and 6 number of columns. There is also another data.frame containing the gene length information.

Details

In order to use XBSeq for testing DE, we need to run HTSeq twice to measure the reads mapped to exonic regions (observed signal) and non-exonic regions (background noise). Firstly, we need to construct the gtf annotation file to measure the background noise:

  • Download refFlat table from UCSC database (http://genome.ucsc.edu) and create the preliminary list of gene-free regions,

  • Download tables of (a) all_mrna; (b) ensGene; (c) pseudoYale60Gene; (d) vegaGene;, (e)xenoMrna, and (f) xenoRefGene from UCSC database and remove regions appear in any of them from the gene-free regions,

  • To guarantee gene-free regions are far enough from exonic regions, trim 100 bps from both sides of intronic regions and 1,000 bps from both sides of inter-genic regions,

  • Shift each exon of a gene to the right nearest gene-free region. Most of the shifted genes remain the same as the original structures of the genes,

  • If the nearby gene-free region is too short, we may only preserve the exon size features but not the whole gene structure. The priority of shifting a region is: i) nearest right gene-free region, 2) nearest left gene-free region; 3) the second right nearest gene-free region and so on until the shift region of the original exon fits, and

  • Shift each exon of a gene to the right nearest gene-free region. Most of the shifted genes remain the same as the original structures of the genes,

  • At last, we considered the shifted regions as the non-exonic regions for each gene and a final .gtf file was created

We carried out HTSeq procedure twice by using a a mouse RNA-seq dataset, which contains 3 replicates of wild type mouse liver tissues (WT) and 3 replicates of Myc transgenic mouse liver tissues (MYC).The dataset is obtained from Gene Expression Omnibus (GSE61875) (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61875) . The two datasets can be loaded via data(ExampleData) after loading the XBSeq library.

The annotation for measuring the background noise can be generated by following the previous steps. Firstly, generate preliminary gene-free regions by calling the function exonFreeRegionShift.pl <-EX exon-GTF file > <-FR gene free region>.

Then remove the potential functional elements by calling the function GEFRshift.pl <-G gene-GTF.gtf > <-I intronRegion.tsv> <-T integenicRegion.tsv> optional: -m mRNA.bed -x xenoMrna.bed -z xenoRefGene.bed -e ensGene.bed -p pseudoGene.bed -v vegaGene.bed -b.

We have already generated gtf files for human (hg18 and hg19) and mouse (mm9 and mm10) and deposited in github. If you would like to generate your own gtf files, the scripts to generate the files ,which are written in perl, are available in the package subfolder XBSeqinstscripts. The scripts are also deposited in github (https://github.com/Liuy12/XBSeq).

Value

Three data.frames as described in format section.

Author(s)

Yuanhang Liu

References

H. I. Chen, Y. Liu, Y. Zou, Z. Lai, D. Sarkar, Y. Huang, et al., "Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads," BMC Genomics, vol. 16 Suppl 7, p. S14, Jun 11 2015.

Results