R: Create a 'FAMTdata' object from an expression, covariates and...
as.FAMTdata
R Documentation
Create a 'FAMTdata' object from an expression, covariates and annotations dataset
Description
The function creates a 'FAMTdata' object containing the expression, the covariates and the annotations dataset if provided. The function checks the consistency of dataframes between them. Then missing values of expression can be imputed.
An expression data frame with genes in rows and arrays in columns. The arrays are identified by the column names.
covariates
An optional data frame with arrays in rows, and covariates in columns. One column must contain the array identification (NULL by default).
annotations
An optional data frame containing informations on the genes (NULL by default)
idcovar
The column number corresponding to the array identification in the covariates data frame (1 by default)
idannot
The column number corresponding to the gene identification in annotations data frame (NULL by default)
na.action
If TRUE (default value), missing expression data are imputed using nearest neighbor averaging (impute.knn function of 'impute' package).
Details
The as.FAMTdata function creates a single R object containing the data stored:
- in one mandatory data-frame: the 'expression' dataset with m rows (if m tests) and n columns (n is the sample size) containing the observations of the responses.
- and two optional data frames: the 'covariates' dataset with n rows and at least 2 columns, one giving the specification to match 'expression' and 'covariates' and the other one containing the observations of at least one covariate. The optional dataset,'annotations', can be provided to help interpreting the factors: with m rows and at least one column to identify the variables (ID).
Value
expression
The expression data frame
covariates
The optional covariates data frame
annotations
The optional data frame containing annotations. The genes annotations such as the functional categories should be in a character form, not in a factor form.
idcovar
The column number corresponding to the array identification in the covariate data frame (which should correspond to the column names in 'expression')
na.expr
Rows and columns of expression with missing values
Note
The class of the data produced with the as.FAMTdata function is called 'FAMTdata'. We advise to carry out a summary of FAMT data with the function summaryFAMT.
Author(s)
David Causeur
See Also
summaryFAMT
Examples
# The data are divided into one mandatory data-frame, the gene expressions,
# and two optional datasets: the covariates, and the annotations.
# The expression dataset with 9893 rows (genes) and 43 columns (arrays)
# containing the observations of the responses.
# The covariates dataset with 43 rows (arrays) and 6 columns:
# the second column gives the specification to match 'expression'
# and 'covariates' (array identification), the other ones contain
# the observations of covariates.
# The annotations dataset contains 9893 rows (genes) and
# 6 columns to help interpreting the factors, the first one (ID)
# identifies the variables (genes).
data(expression)
data(covariates)
data(annotations)
# Create the 'FAMTdata'
############################################
chicken = as.FAMTdata(expression,covariates,annotations,idcovar=2)
# 'FAMTdata' summary
summaryFAMT(chicken)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(FAMT)
Loading required package: mnormt
Loading required package: impute
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FAMT/as.FAMTdata.Rd_%03d_medium.png", width=480, height=480)
> ### Name: as.FAMTdata
> ### Title: Create a 'FAMTdata' object from an expression, covariates and
> ### annotations dataset
> ### Aliases: as.FAMTdata
>
> ### ** Examples
>
> # The data are divided into one mandatory data-frame, the gene expressions,
> # and two optional datasets: the covariates, and the annotations.
>
> # The expression dataset with 9893 rows (genes) and 43 columns (arrays)
> # containing the observations of the responses.
> # The covariates dataset with 43 rows (arrays) and 6 columns:
> # the second column gives the specification to match 'expression'
> # and 'covariates' (array identification), the other ones contain
> # the observations of covariates.
> # The annotations dataset contains 9893 rows (genes) and
> # 6 columns to help interpreting the factors, the first one (ID)
> # identifies the variables (genes).
>
> data(expression)
> data(covariates)
> data(annotations)
>
> # Create the 'FAMTdata'
> ############################################
> chicken = as.FAMTdata(expression,covariates,annotations,idcovar=2)
$`Rows with missing values`
integer(0)
$`Columns with missing values`
integer(0)
> # 'FAMTdata' summary
> summaryFAMT(chicken)
$expression
$expression$`Number of tests`
[1] 9893
$expression$`Sample size`
[1] 43
$covariates
AfClass ArrayName Mere Lot Pds9s Af
F :18 F10 : 1 GMB05555:10 L2:16 Min. :1994 Min. :-25.5397
L :19 F11 : 1 GMB05625: 7 L3:11 1st Qu.:2284 1st Qu.: -8.0042
NC: 6 F12 : 1 GMB05562: 5 L4: 8 Median :2371 Median : 2.7166
F13 : 1 GMB05599: 5 L5: 8 Mean :2370 Mean : 0.2365
F14 : 1 GMB05554: 4 3rd Qu.:2474 3rd Qu.: 8.6037
F15 : 1 GMB05589: 4 Max. :2618 Max. : 18.1024
(Other):37 (Other) : 8
$annotations
ID
RIGG00001: 1
RIGG00002: 1
RIGG00003: 1
RIGG00005: 1
RIGG00006: 1
RIGG00007: 1
(Other) :9887
Name
Weakly similar to Q95JC9 (Q95JC9) Basic proline-rich protein : 11
No Match : 8
Weakly similar to Q90811 (Q90811) Hypothetical 28.6 kDa protein (Fragment) : 8
Weakly similar to Q9DDJ7 (Q9DDJ7) Retinoblastoma tumor suppressor (Fragment): 8
Weakly similar to Q08525 (Q08525) Reverse transcriptase : 6
Weakly similar to Q8MW53 (Q8MW53) Precollagen-D : 6
(Other) :9846
Block Column Row Length
Min. : 1.00 Min. : 1.00 Min. : 1.00 Min. :60.00
1st Qu.:13.00 1st Qu.: 6.00 1st Qu.: 6.00 1st Qu.:70.00
Median :25.00 Median :11.00 Median :12.00 Median :70.00
Mean :24.87 Mean :11.04 Mean :11.63 Mean :69.57
3rd Qu.:37.00 3rd Qu.:16.00 3rd Qu.:17.00 3rd Qu.:70.00
Max. :48.00 Max. :21.00 Max. :22.00 Max. :75.00
>
>
>
>
>
>
> dev.off()
null device
1
>