R: Example Datasets used in the manual pages as well as in...
ExampleData
R Documentation
Example Datasets used in the manual pages as well as in vignette
Description
Example datasets used in manual pages and vignette by carrying out HTSeq procedure for exonic mapped reads (Observed) and non-exonic mapped reads (Background) and gene length information (genelength).
Usage
data(ExampleData)
Format
ExampleData contains three data.frames. Two of them are expression matrix. One is called 'Observed'. One is called 'Background'. For the two data.frames, rows represent exonic
or non-exonic region mapped reads for each gene. Columns represent each sample. Both the two data.frames have total of 22609 number of rows and 6 number
of columns. There is also another data.frame containing the gene length information.
Details
In order to use XBSeq for testing DE, we need to run HTSeq twice to measure the reads mapped to exonic regions (observed signal) and
non-exonic regions (background noise). Firstly, we need to construct the gtf annotation file to measure the background noise:
Download refFlat table from UCSC database (http://genome.ucsc.edu) and create the preliminary list of gene-free regions,
Download tables of (a) all_mrna; (b) ensGene; (c) pseudoYale60Gene; (d) vegaGene;, (e)xenoMrna, and (f) xenoRefGene from UCSC database and remove
regions appear in any of them from the gene-free regions,
To guarantee gene-free regions are far enough from exonic regions, trim 100 bps from both sides of intronic regions and 1,000 bps from both sides
of inter-genic regions,
Shift each exon of a gene to the right nearest gene-free region. Most of the shifted genes remain the same as the original structures of the genes,
If the nearby gene-free region is too short, we may only preserve the exon size features but not the whole gene structure. The priority of shifting
a region is: i) nearest right gene-free region, 2) nearest left gene-free region; 3) the second right nearest gene-free region and so on until the shift
region of the original exon fits, and
Shift each exon of a gene to the right nearest gene-free region. Most of the shifted genes remain the same as the original structures of the genes,
At last, we considered the shifted regions as the non-exonic regions for each gene and a final .gtf file was created
We carried out HTSeq procedure twice by using a a mouse RNA-seq dataset, which contains 3 replicates of wild type mouse liver tissues (WT) and 3
replicates of Myc transgenic mouse liver tissues (MYC).The dataset is obtained from Gene Expression Omnibus (GSE61875)
(http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE61875) . The two datasets can be loaded via data(ExampleData) after loading the XBSeq library.
The annotation for measuring the background noise can be generated by following the previous steps. Firstly, generate preliminary gene-free regions by calling
the function
exonFreeRegionShift.pl <-EX exon-GTF file > <-FR gene free region>.
Then remove the potential functional elements by calling the function
GEFRshift.pl <-G gene-GTF.gtf > <-I intronRegion.tsv> <-T integenicRegion.tsv> optional: -m mRNA.bed -x xenoMrna.bed -z xenoRefGene.bed -e ensGene.bed
-p pseudoGene.bed -v vegaGene.bed -b.
We have already generated gtf files for human (hg18 and hg19) and mouse (mm9 and mm10) and deposited in github. If you
would like to generate your own gtf files, the scripts to generate the files ,which are written in perl, are available in the package subfolder XBSeqinstscripts.
The scripts are also deposited in github (https://github.com/Liuy12/XBSeq).
Value
Three data.frames as described in format section.
Author(s)
Yuanhang Liu
References
H. I. Chen, Y. Liu, Y. Zou, Z. Lai, D. Sarkar, Y. Huang, et al.,
"Differential expression analysis of RNA sequencing data by
incorporating non-exonic mapped reads," BMC Genomics, vol. 16
Suppl 7, p. S14, Jun 11 2015.