Last data update: 2014.03.03

R: Create a 'FAMTdata' object from an expression, covariates and...
as.FAMTdataR Documentation

Create a 'FAMTdata' object from an expression, covariates and annotations dataset

Description

The function creates a 'FAMTdata' object containing the expression, the covariates and the annotations dataset if provided. The function checks the consistency of dataframes between them. Then missing values of expression can be imputed.

Usage

as.FAMTdata(expression, covariates = NULL, annotations = NULL, idcovar = 1, 
idannot = NULL, na.action=TRUE)

Arguments

expression

An expression data frame with genes in rows and arrays in columns. The arrays are identified by the column names.

covariates

An optional data frame with arrays in rows, and covariates in columns. One column must contain the array identification (NULL by default).

annotations

An optional data frame containing informations on the genes (NULL by default)

idcovar

The column number corresponding to the array identification in the covariates data frame (1 by default)

idannot

The column number corresponding to the gene identification in annotations data frame (NULL by default)

na.action

If TRUE (default value), missing expression data are imputed using nearest neighbor averaging (impute.knn function of 'impute' package).

Details

The as.FAMTdata function creates a single R object containing the data stored: - in one mandatory data-frame: the 'expression' dataset with m rows (if m tests) and n columns (n is the sample size) containing the observations of the responses. - and two optional data frames: the 'covariates' dataset with n rows and at least 2 columns, one giving the specification to match 'expression' and 'covariates' and the other one containing the observations of at least one covariate. The optional dataset,'annotations', can be provided to help interpreting the factors: with m rows and at least one column to identify the variables (ID).

Value

expression

The expression data frame

covariates

The optional covariates data frame

annotations

The optional data frame containing annotations. The genes annotations such as the functional categories should be in a character form, not in a factor form.

idcovar

The column number corresponding to the array identification in the covariate data frame (which should correspond to the column names in 'expression')

na.expr

Rows and columns of expression with missing values

Note

The class of the data produced with the as.FAMTdata function is called 'FAMTdata'. We advise to carry out a summary of FAMT data with the function summaryFAMT.

Author(s)

David Causeur

See Also

summaryFAMT

Examples

# The data are divided into one mandatory data-frame, the gene expressions, 
#  and two optional datasets: the covariates, and the annotations.

# The expression dataset with 9893 rows (genes) and 43 columns (arrays)
#  containing the observations of the responses.
# The covariates dataset with 43 rows (arrays) and 6 columns: 
#  the second column gives the specification to match 'expression' 
#  and 'covariates' (array identification), the other ones contain
#  the observations of covariates.
# The annotations dataset contains 9893 rows (genes) and 
#  6 columns to help interpreting the factors, the first one (ID) 
#  identifies the variables (genes). 

data(expression)
data(covariates)
data(annotations)

# Create the 'FAMTdata'
############################################
chicken = as.FAMTdata(expression,covariates,annotations,idcovar=2)
# 'FAMTdata' summary
summaryFAMT(chicken)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FAMT)
Loading required package: mnormt
Loading required package: impute
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FAMT/as.FAMTdata.Rd_%03d_medium.png", width=480, height=480)
> ### Name: as.FAMTdata
> ### Title: Create a 'FAMTdata' object from an expression, covariates and
> ###   annotations dataset
> ### Aliases: as.FAMTdata
> 
> ### ** Examples
> 
> # The data are divided into one mandatory data-frame, the gene expressions, 
> #  and two optional datasets: the covariates, and the annotations.
> 
> # The expression dataset with 9893 rows (genes) and 43 columns (arrays)
> #  containing the observations of the responses.
> # The covariates dataset with 43 rows (arrays) and 6 columns: 
> #  the second column gives the specification to match 'expression' 
> #  and 'covariates' (array identification), the other ones contain
> #  the observations of covariates.
> # The annotations dataset contains 9893 rows (genes) and 
> #  6 columns to help interpreting the factors, the first one (ID) 
> #  identifies the variables (genes). 
> 
> data(expression)
> data(covariates)
> data(annotations)
> 
> # Create the 'FAMTdata'
> ############################################
> chicken = as.FAMTdata(expression,covariates,annotations,idcovar=2)
$`Rows with missing values`
integer(0)

$`Columns with missing values`
integer(0)

> # 'FAMTdata' summary
> summaryFAMT(chicken)
$expression
$expression$`Number of tests`
[1] 9893

$expression$`Sample size`
[1] 43


$covariates
 AfClass   ArrayName        Mere    Lot         Pds9s            Af          
 F :18   F10    : 1   GMB05555:10   L2:16   Min.   :1994   Min.   :-25.5397  
 L :19   F11    : 1   GMB05625: 7   L3:11   1st Qu.:2284   1st Qu.: -8.0042  
 NC: 6   F12    : 1   GMB05562: 5   L4: 8   Median :2371   Median :  2.7166  
         F13    : 1   GMB05599: 5   L5: 8   Mean   :2370   Mean   :  0.2365  
         F14    : 1   GMB05554: 4           3rd Qu.:2474   3rd Qu.:  8.6037  
         F15    : 1   GMB05589: 4           Max.   :2618   Max.   : 18.1024  
         (Other):37   (Other) : 8                                            

$annotations
         ID      
 RIGG00001:   1  
 RIGG00002:   1  
 RIGG00003:   1  
 RIGG00005:   1  
 RIGG00006:   1  
 RIGG00007:   1  
 (Other)  :9887  
                                                                           Name     
 Weakly similar to Q95JC9 (Q95JC9) Basic proline-rich protein                :  11  
 No Match                                                                    :   8  
 Weakly similar to Q90811 (Q90811) Hypothetical 28.6 kDa protein (Fragment)  :   8  
 Weakly similar to Q9DDJ7 (Q9DDJ7) Retinoblastoma tumor suppressor (Fragment):   8  
 Weakly similar to Q08525 (Q08525) Reverse transcriptase                     :   6  
 Weakly similar to Q8MW53 (Q8MW53) Precollagen-D                             :   6  
 (Other)                                                                     :9846  
     Block           Column           Row            Length     
 Min.   : 1.00   Min.   : 1.00   Min.   : 1.00   Min.   :60.00  
 1st Qu.:13.00   1st Qu.: 6.00   1st Qu.: 6.00   1st Qu.:70.00  
 Median :25.00   Median :11.00   Median :12.00   Median :70.00  
 Mean   :24.87   Mean   :11.04   Mean   :11.63   Mean   :69.57  
 3rd Qu.:37.00   3rd Qu.:16.00   3rd Qu.:17.00   3rd Qu.:70.00  
 Max.   :48.00   Max.   :21.00   Max.   :22.00   Max.   :75.00  
                                                                

> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>