Last data update: 2014.03.03

R: Custom mappings added to the package
illuminaHumanv2listNewMappingsR Documentation

Custom mappings added to the package

Description

We have used an extensive re-annotation of the illuminaHumanv2 probe sequences to provide additional information that is not captured in the standard Bioconductor packages. Whereas Bioconductor annotations are based on the RefSeq ID that each probe maps to, our additional mappings provide data specific to each probe on the platform. See below for details. We recommend using the probe quality as a form of filtering, and retaining only perfect or good probes for an analysis.

Details of custom mappings

illuminaHumanv2listNewMappings

List all the custom re-annotation mappings provided by the package

illuminaHumanv2fullReannotation

Return all the re-annotation information as a matrix

illuminaHumanv2ARRAYADDRESS

Array Address code used to identify the probe at the bead-level

illuminaHumanv2NUID

Lumi's nuID (universal naming scheme for oligonucleotides) Reference: Du et al. (2007), Biol Direct 2:16

illuminaHumanv2PROBESEQUENCE

The 50 base sequence for the probe

illuminaHumanv2PROBEQUALITY

Quality grade assigned to the probe: “Perfect” if it perfectly and uniquely matches the target transcript; “Good” if the probe, although imperfectly matching the target transcript, is still likely to provide considerably sensitive signal (up to two mismatches are allowed, based on empirical evidence that the signal intensity for 50-mer probes with less than 95% identity to the respective targets is less than 50% of the signal associated with perfect matches *); “Bad” if the probe matches repeat sequences, intergenic or intronic regions, or is unlikely to provide specific signal for any transcript; “No match” if it does not match any genomic region or transcript.

illuminaHumanv2CODINGZONE

Coding status of target sequence: intergenic / intronic / Transcriptomic (“Transcriptomic” when the target transcript is non-coding or there is no information on the coding sequence)

illuminaHumanv2GENOMICLOCATION

Probe's genomic coordinates (hg19 for human, mm9 for mouse or rn4 for rat)

illuminaHumanv2GENOMICMATCHSIMILARITY

Percentage of similarity between the probe and its best genomic match in the alignable region, taking the probe as reference

illuminaHumanv2SECONDMATCHES

Genomic coordinates of second best matches between the probe and the genome

illuminaHumanv2SECONDMATCHSIMILARITY

Percentage of similarity between the probe and its second best genomic match in the alignable region, taking the probe as reference

illuminaHumanv2TRANSCRIPTOMICMATCHSIMILARITY

Percentage of similarity between the probe and its target transcript in the alignable region, taking the probe as reference

illuminaHumanv2OTHERGENOMICMATCHES

Genomic coordinates of sequences as alignable with the probe (in terms of number of matching nucleotides) as its main target

illuminaHumanv2REPEATMASK

Overlapping RepeatMasked sequences, with number of bases overlapped by the repeat

illuminaHumanv2OVERLAPPINGSNP

Overlapping annotated SNPs

illuminaHumanv2ENTREZREANNOTATED

Entrez IDs

illuminaHumanv2ENSEMBLREANNOTATED

Ensembl IDs

illuminaHumanv2SYMBOLREANNOTATED

Gene symbol derived by re-annotation

illuminaHumanv2REPORTERGROUPID

For probes marked as controls in Illuminas annotation file, these gives the type of control

illuminaHumanv2REPORTERGROUPNAME

Usually a more informative name for the control type

References

http://remoat.sysbiol.cam.ac.uk

Barbosa-Morais et al. (2010) A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data. Nucleic Acids Research

Examples


	##See what new mappings are available

	illuminaHumanv2listNewMappings()
	

        x <- illuminaHumanv2PROBEQUALITY

        mapped_probes <- mappedkeys(x)
        # Convert to a list
        xx <- as.list(x[mapped_probes])
        if(length(xx) > 0) {
          # Get the PROBEQUALITY for the first five probes
          xx[1:5]
          # Get the first one
          xx[[1]]
        }


	##Overall table of qualities
	table(unlist(xx))

	

        x <- illuminaHumanv2ARRAYADDRESS

        mapped_probes <- mappedkeys(x)
        # Convert to a list
        xx <- as.list(x[mapped_probes])
        if(length(xx) > 0) {
          # Get the ARRAYADDRESS for the first five probes
          xx[1:5]
          # Get the first one
          xx[[1]]
        }

	##Can do the mapping from array address to illumina ID using a revmap
	
	y<- revmap(illuminaHumanv2ARRAYADDRESS)
	
        mapped_probes <- mappedkeys(y)
        # Convert to a list
        yy <- as.list(y[mapped_probes])
        if(length(yy) > 0) {
          # Get the ARRAYADDRESS for the first five probes
          yy[1:5]
          # Get the first one
          yy[[1]]
        }
	


        x <- illuminaHumanv2CODINGZONE

        mapped_probes <- mappedkeys(x)
        # Convert to a list
        xx <- as.list(x[mapped_probes])
        if(length(xx) > 0) {
          # Get the CODINGZONE for the first five probes
          xx[1:5]
          # Get the first one
          xx[[1]]
        }

        x <- illuminaHumanv2PROBESEQUENCE

        mapped_probes <- mappedkeys(x)
        # Convert to a list
        xx <- as.list(x[mapped_probes])
        if(length(xx) > 0) {
          # Get the PROBESEQUENCE for the first five probes
          xx[1:5]
          # Get the first one
          xx[[1]]
        }


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(illuminaHumanv2.db)
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: org.Hs.eg.db


> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/illuminaHumanv2.db/illuminaHumanv2NewMappings.Rd_%03d_medium.png", width=480, height=480)
> ### Name: illuminaHumanv2listNewMappings
> ### Title: Custom mappings added to the package
> ### Aliases: illuminaHumanv2ARRAYADDRESS illuminaHumanv2NUID
> ###   illuminaHumanv2PROBESEQUENCE illuminaHumanv2PROBEQUALITY
> ###   illuminaHumanv2CODINGZONE illuminaHumanv2GENOMICLOCATION
> ###   illuminaHumanv2GENOMICMATCHSIMILARITY illuminaHumanv2SECONDMATCHES
> ###   illuminaHumanv2SECONDMATCHSIMILARITY
> ###   illuminaHumanv2TRANSCRIPTOMICMATCHSIMILARITY
> ###   illuminaHumanv2OTHERGENOMICMATCHES illuminaHumanv2REPEATMASK
> ###   illuminaHumanv2OVERLAPPINGSNP illuminaHumanv2ENTREZREANNOTATED
> ###   illuminaHumanv2ENSEMBLREANNOTATED illuminaHumanv2SYMBOLREANNOTATED
> ###   illuminaHumanv2listNewMappings illuminaHumanv2fullReannotation
> ###   illuminaHumanv2REPORTERGROUPNAME illuminaHumanv2REPORTERGROUPID
> ### Keywords: datasets
> 
> ### ** Examples
> 
> 
> 	##See what new mappings are available
> 
> 	illuminaHumanv2listNewMappings()
illuminaHumanv2ARRAYADDRESS()
illuminaHumanv2NUID()
illuminaHumanv2PROBEQUALITY()
illuminaHumanv2CODINGZONE()
illuminaHumanv2PROBESEQUENCE()
illuminaHumanv2SECONDMATCHES()
illuminaHumanv2OTHERGENOMICMATCHES()
illuminaHumanv2REPEATMASK()
illuminaHumanv2OVERLAPPINGSNP()
illuminaHumanv2ENTREZREANNOTATED()
illuminaHumanv2GENOMICLOCATION()
illuminaHumanv2SYMBOLREANNOTATED()
illuminaHumanv2REPORTERGROUPNAME()
illuminaHumanv2REPORTERGROUPID()
illuminaHumanv2ENSEMBLREANNOTATED()
> 	
> 
>         x <- illuminaHumanv2PROBEQUALITY
> 
>         mapped_probes <- mappedkeys(x)
>         # Convert to a list
>         xx <- as.list(x[mapped_probes])
>         if(length(xx) > 0) {
+           # Get the PROBEQUALITY for the first five probes
+           xx[1:5]
+           # Get the first one
+           xx[[1]]
+         }
[1] "Perfect***"
> 
> 
> 	##Overall table of qualities
> 	table(unlist(xx))

        Bad        Good     Good***    Good****    No match     Perfect 
      15528         792         180         317        2674       20671 
 Perfect*** Perfect**** 
       7709        1610 
> 
> 	
> 
>         x <- illuminaHumanv2ARRAYADDRESS
> 
>         mapped_probes <- mappedkeys(x)
>         # Convert to a list
>         xx <- as.list(x[mapped_probes])
>         if(length(xx) > 0) {
+           # Get the ARRAYADDRESS for the first five probes
+           xx[1:5]
+           # Get the first one
+           xx[[1]]
+         }
[1] "2100682"
> 
> 	##Can do the mapping from array address to illumina ID using a revmap
> 	
> 	y<- revmap(illuminaHumanv2ARRAYADDRESS)
> 	
>         mapped_probes <- mappedkeys(y)
>         # Convert to a list
>         yy <- as.list(y[mapped_probes])
>         if(length(yy) > 0) {
+           # Get the ARRAYADDRESS for the first five probes
+           yy[1:5]
+           # Get the first one
+           yy[[1]]
+         }
[1] "ILMN_1910180"
> 	
> 
> 
>         x <- illuminaHumanv2CODINGZONE
> 
>         mapped_probes <- mappedkeys(x)
>         # Convert to a list
>         xx <- as.list(x[mapped_probes])
>         if(length(xx) > 0) {
+           # Get the CODINGZONE for the first five probes
+           xx[1:5]
+           # Get the first one
+           xx[[1]]
+         }
[1] "Transcriptomic?"
> 
>         x <- illuminaHumanv2PROBESEQUENCE
> 
>         mapped_probes <- mappedkeys(x)
>         # Convert to a list
>         xx <- as.list(x[mapped_probes])
>         if(length(xx) > 0) {
+           # Get the PROBESEQUENCE for the first five probes
+           xx[1:5]
+           # Get the first one
+           xx[[1]]
+         }
[1] "ACACCTTCAGGAGGGAAGCCCTTATTTCTGGGTTGAACTCCCCTTCCATG"
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>