Last data update: 2014.03.03

R: Create Null Model for Association Test
nullModelR Documentation

Create Null Model for Association Test

Description

Method for creating a null model that can be used for association testing using assocTest

Usage

## S4 method for signature 'formula,data.frame'
nullModel(X, y, data,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          n.resampling=0,
          type.resampling=c("bootstrap", "permutation"),
          adj=c("automatic", "none", "force"), adjExact=FALSE,
          n.resampling.adj=10000, checkData=TRUE)
## S4 method for signature 'formula,missing'
nullModel(X, y, data,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          n.resampling=0,
          type.resampling=c("bootstrap", "permutation"),
          adj=c("automatic", "none", "force"), adjExact=FALSE,
          n.resampling.adj=10000, checkData=TRUE)
## S4 method for signature 'matrix,numeric'
nullModel(X, y,
          type=c("automatic", "logistic", "linear"), ...)
## S4 method for signature 'matrix,factor'
nullModel(X, y,
          type=c("automatic", "logistic", "linear"), ...)
## S4 method for signature 'missing,numeric'
nullModel(X, y,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          ...)
## S4 method for signature 'missing,factor'
nullModel(X, y,
          type=c("automatic", "logistic", "linear", "bernoulli"),
          ...)

Arguments

X

a formula or matrix

y

if the formula interface is used, y can be used to pass a data frame with the table in which both covariates and traits are contained (alternatively, the data argument can be used for that purpose). The other methods (if X is not a formula) expect y to be the trait vector. Trait vectors can either be numeric vectors or a factor with two levels (see details below).

data

for consistency with standard R methods from the stats package, the data frame can also be passed to nullModel via the data argument. In this case, the y must be empty. If y is specified, data is ignored.

type

type of model to train (see details below)

n.resampling

number of null model residuals to sample; set to zero (default) to turn resampling off; resampling is not supported for plain trait vectors without covariates

type.resampling

method how to sample null model residuals; the choice “permutation” refers to simple random permutations of the model's residuals. If “bootstrap” is chosen (default), the following strategy is applied for linear models (continuous trait): residuals are sampled as normally distributed values with mean 0 and the same standard deviation as the model's residuals. For logistic models (binary trait), the choice “bootstrap” selects the same bootstrapping method that is implemented in the SKAT package.

adj

whether or not to use small sample correction for logistic models (binary trait with covariates). The choice “none” turns off small sample correction. If “force” is chosen, small sample correction is turned on unconditionally. If “automatic” is chosen (default), small sample correction is turned on if the number of samples does not exceed 2,000. This argument is ignored for any type of model except “logistic” and small sample correction is switched off.

adjExact

in case small sample correction is switched on (see above), this argument indicates whether or not the exact square root of the matrix P_0 should be pre-computed (see Subsection 9.5 of the package vignette). The default is FALSE. This argument is ignored if small sample correction is not switched on.

n.resampling.adj

number of null model residuals to sample for the adjustment of higher moments; ignored if small sample correction is switched off.

checkData

if FALSE, only a very limited set of input checks is performed. The purpose of this option is to save computational effort for repeated input checks if the function is called from a function that has already performed input checks. The default is TRUE. Only change to FALSE if you know what you are doing!

...

all other parameters are passed on to the nullModel method with signature formula,data.frame.

Details

The podkat package assumes a mixed model in which the trait under investigation depends both on covariates (if any) and the genotype. The nullModel method models the relationship between the trait and the covariates (if any) without taking the genotype into account, which corresponds to the null assumption that the trait and the genotype are independent. Therefore, we speak of null models. The following types of models are presently available:

Linear model (type “linear”):

a linear model is trained for a continuous trait and a given set of covariates (if any); this is done by standard linear regression using the lm function.

Logistic linear model (type “logistic”):

a generalized linear model is trained for a binary trait and a given set of covariates (if any); this is done by logistic regression using the glm function.

Bernoulli-distributed trait (type “bernoulli”):

a binary trait without covariates is interpreted as instances of a simple Bernoulli process with p being the relative frequencies 1's/cases.

The type argument can be used to select the type of model, where the following restrictions apply:

  • For linear models, the trait vector must be numerical. Factors/factor columns are not accepted.

  • For logistic models and Bernoulli-distributed traits, both numerical vectors and factors are acceptable. In any case, only 0's (controls) and 1's (cases) are accepted. Furthermore, nullModel quits with an error if the trait shows no variation. In other words, trait vectors that only contain 0's or only contain 1's are not accepted (as association testings makes little sense for such traits anyway).

The following interfaces are available to specify the traits and the covariates (if any):

Formula interface:

the first argument X can be a formula that specifies the trait vector/column, the covariate matrix/columns (if any), and the intercept (if any). If neither the y argument nor the data argument is specified, nullModel searches the environment from which the function has been called. This interface is largely analogous to the functions lm and glm.

Trait vector without covariates:

if the X argument is omitted and y is a numeric vector or factor, y is interpreted as trait vector, and a null model is created from y without covariates. Linear and logistic models are trained with an intercept. For type “bernoulli”, the trait vector is written to the output object as is.

Trait vector plus covariate matrix:

if the X argument is a matrix and y is a numeric vector or factor, y is interpreted as trait vector and X is interpreted as covariate matrix. In this case, linear and logistic models are trained as (generalized) linear regressors that predict the trait from the covariates plus an intercept. The type “bernoulli” is not available for this variant, since this type of model cannot consider covariates.

All nullModel methods also support the choice type="automatic". In this case, nullModel guesses the most reasonable type of model in the following way: If the trait vector/column is a factor or a numeric vector containing only 0's and 1's (where both values must be present, as noted above already), the trait is supposed to be binary and the type “logistic” is assumed, unless the following conditions are satisfied:

  1. The number of samples does not exceed 100.

  2. No intercept and no covariates have been specified. This condition can be met by supplying an empty model to the formula interface (e.g. y ~ 0) or by supplying the trait vector as argument y while omitting X.

If these two conditions are fulfilled for a binary trait, nullModel chooses the type “bernoulli”. If the trait is not binary and the trait vector/column is numeric, nullModel assumes type “linear”.

For consistency with the SKAT package, the podkat package also offers resampling, i.e. a certain number of vectors of residuals are sampled according to the null model. This can be done when training the null model by setting the n.resampling parameter (number of residual vectors that are sampled) to a value larger than 0. Then, when association testing is performed, p-values are computed also for all these sampled residuals, and an additional estimated p-value is computed as the relative frequency of p-values of sampled residuals that are at least as significant as the test's p-value. The procedure to sample residuals is controlled with the type.resampling argument (see above).

For logistic models (type “logistic”), assocTest offers the small sample correction as introduced by Lee et al. (2012). If the adjustment of higher moments should be applied, some preparations need to be made already when training the null model. Which preparations are carried out, can be controlled by the arguments adj, adjExact, n.resampling.adj, and type.resampling (see descriptions of arguments above and Subsection 9.5 of the package vignette).

If any missing values are found in the trait vector/column or the covariate matrix/columns, the respective samples are omitted from the resulting model (which is the standard behavior of lm and glm anyway). The indices of the omitted samples are stored in the na.omit slot of the returned NullModel object.

Value

returns a NullModel object

Author(s)

Ulrich Bodenhofer bodenhofer@bioinf.jku.at

References

http://www.bioinf.jku.at/software/podkat

Lee, S., Emond, M. J., Bamshad, M. J., Barnes, K. C., Rieder, M. J., Nickerson, D. A., NHLBI Exome Sequencing Project - ESP Lung Project Team, Christiani, D. C., Wurfel, M. M., and Lin, X. (2012) Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am. J. Hum. Genet. 91, 224-237. DOI: 10.1016/j.ajhg.2012.06.007.

See Also

NullModel, lm, glm

Examples

## read phenotype data from CSV file (continuous trait + covariates)
phenoFile <- system.file("examples/example1lin.csv", package="podkat")
pheno <-read.table(phenoFile, header=TRUE, sep=",")

## train null model with all covariates in data frame 'pheno'
model <- nullModel(y ~ ., pheno)
model
length(model)
residuals(model)

## read phenotype data from CSV file (binary trait + covariates)
phenoFile <- system.file("examples/example1log.csv", package="podkat")
pheno <-read.table(phenoFile, header=TRUE, sep=",")

## train null model with all covariates in data frame 'pheno'
model <- nullModel(y ~ ., pheno)
model
length(model)
residuals(model)

## "train" simple Bernoulli model on a subset of 100 samples
model <- nullModel(y ~ 0, pheno[1:100, ])
model
length(model)
residuals(model)

## alternatively, use the interface that only supplies the
## trait vector
model <- nullModel(y=pheno[1:100, ]$y)
model

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(podkat)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/podkat/nullModel-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: nullModel
> ### Title: Create Null Model for Association Test
> ### Aliases: nullModel nullModel-methods nullModel,matrix,numeric-method
> ###   nullModel,matrix,factor-method nullModel,missing,numeric-method
> ###   nullModel,missing,factor-method nullModel,formula,missing-method
> ###   nullModel,formula,data.frame-method
> ### Keywords: methods
> 
> ### ** Examples
> 
> ## read phenotype data from CSV file (continuous trait + covariates)
> phenoFile <- system.file("examples/example1lin.csv", package="podkat")
> pheno <-read.table(phenoFile, header=TRUE, sep=",")
> 
> ## train null model with all covariates in data frame 'pheno'
> model <- nullModel(y ~ ., pheno)
> model
Linear model:
	Number of covariates: 2 (+ intercept)
	Number of samples: 200
	Variance of residuals: 1.541756 
	No resampling
> length(model)
[1] 200
> residuals(model)
         S1          S2          S3          S4          S5          S6 
 0.39014181 -1.39300779  1.85400931 -1.61205229  1.33408378 -0.38730241 
         S7          S8          S9         S10         S11         S12 
-1.14282631 -0.96029056 -1.51834011 -0.54827192 -1.01348240 -1.55758254 
        S13         S14         S15         S16         S17         S18 
 1.53464252  1.07833976  1.60811676  0.19654847 -2.06523017  1.90660882 
        S19         S20         S21         S22         S23         S24 
-0.72476781  2.37107646 -0.55910036  1.20271608 -0.62130314  0.55283759 
        S25         S26         S27         S28         S29         S30 
 0.84414011  0.55622025  0.35151060  1.33753425 -2.19567386  0.37039083 
        S31         S32         S33         S34         S35         S36 
-0.68344743  0.44499220 -0.10213178 -0.62772595 -2.48237393 -1.70344322 
        S37         S38         S39         S40         S41         S42 
 0.19996009  0.48322856  0.61151372  0.17842953  0.94216971  3.76197892 
        S43         S44         S45         S46         S47         S48 
 2.63255560 -0.32488929  0.18098393 -0.44430856  0.14643080  1.06171198 
        S49         S50         S51         S52         S53         S54 
 0.87117521  0.15183686  0.19655101 -2.74400944 -1.94529193  0.40266285 
        S55         S56         S57         S58         S59         S60 
-1.75012570 -0.23124708  1.05526112  0.01422120 -1.40387189 -0.19500011 
        S61         S62         S63         S64         S65         S66 
-2.43082308  0.30497638 -0.22147083 -0.45480724 -0.79988329  1.70626630 
        S67         S68         S69         S70         S71         S72 
-0.74353686  0.80805709 -0.87785569  0.97984313 -0.61811470 -0.83171062 
        S73         S74         S75         S76         S77         S78 
 0.37480549  0.44021835  0.81507998  1.15273351  1.37346569  1.16607400 
        S79         S80         S81         S82         S83         S84 
 0.90283624  0.06793032  0.58744320  0.17760469 -1.64128972  1.41244134 
        S85         S86         S87         S88         S89         S90 
-1.48552498 -0.19755143  0.27469699 -0.86674652  2.09286110 -0.40525504 
        S91         S92         S93         S94         S95         S96 
 1.70646964  0.72383468  2.20365021 -1.28028070 -0.95019213 -0.85546495 
        S97         S98         S99        S100        S101        S102 
 0.79538547  0.05156775  1.37717389 -2.01649232  0.50503219  0.80842013 
       S103        S104        S105        S106        S107        S108 
 2.32195025 -1.51964043 -1.04327597  0.78113904 -0.03700412 -3.06370053 
       S109        S110        S111        S112        S113        S114 
 2.06301631 -1.78171375 -1.07629035 -0.30388307  2.43209814 -0.47550116 
       S115        S116        S117        S118        S119        S120 
 0.09150307  0.46828691  0.79631416  0.58181695 -0.23379269 -0.87663031 
       S121        S122        S123        S124        S125        S126 
 1.84642167 -1.79884464 -2.03370937  0.42806331 -1.09712422 -0.35465270 
       S127        S128        S129        S130        S131        S132 
-0.42796367 -0.52457553  2.39742586  0.79145751 -1.69423801  1.22795810 
       S133        S134        S135        S136        S137        S138 
 3.38083228  0.42832027  1.33024311  0.03614366 -0.03307896  0.19840447 
       S139        S140        S141        S142        S143        S144 
-0.82528056  0.47121330 -1.05624364 -0.30265278 -0.09305519 -0.76627245 
       S145        S146        S147        S148        S149        S150 
-0.15996942 -1.11034339 -0.13947800  0.04442201  1.31156104  0.05773790 
       S151        S152        S153        S154        S155        S156 
-1.51706512 -0.48340823 -1.28372951  1.74455205  0.18559217  0.53506101 
       S157        S158        S159        S160        S161        S162 
-0.31675682  1.74486201 -0.73952458  1.60615030 -2.43569022 -0.78778190 
       S163        S164        S165        S166        S167        S168 
-0.85470255 -0.60724082 -2.13869012 -2.23853120 -0.06356836 -0.80209005 
       S169        S170        S171        S172        S173        S174 
-0.42275366 -0.27730446  0.21477973 -0.99495669 -0.13693821  1.10726126 
       S175        S176        S177        S178        S179        S180 
-0.75809829  3.25898566  0.49555921 -1.63844984 -0.60092204  1.20955023 
       S181        S182        S183        S184        S185        S186 
 0.52329060  0.40950944  0.06245806  1.15603490  1.48270760  0.20813829 
       S187        S188        S189        S190        S191        S192 
-2.41123531 -1.10379879  1.36808303 -1.12515020  0.53708718  0.62900469 
       S193        S194        S195        S196        S197        S198 
-0.10175562  0.81119248 -0.76371545  0.43234109 -0.57459961 -2.09055752 
       S199        S200 
 0.10457684  0.76547256 
> 
> ## read phenotype data from CSV file (binary trait + covariates)
> phenoFile <- system.file("examples/example1log.csv", package="podkat")
> pheno <-read.table(phenoFile, header=TRUE, sep=",")
> 
> ## train null model with all covariates in data frame 'pheno'
> model <- nullModel(y ~ ., pheno)
small sample correction applied
> model
Logistic model:
	Number of covariates: 2 (+ intercept)
	Number of samples: 200
	Number of positives (cases): 10 
	No resampling
	Adjustment of higher moments: 10000 repeats (bootstrap)
> length(model)
[1] 200
> residuals(model)
          S1           S2           S3           S4           S5           S6 
-0.045393031 -0.167930303 -0.057814107 -0.003955454 -0.027595389 -0.105811682 
          S7           S8           S9          S10          S11          S12 
-0.019988648 -0.034810400 -0.033473174 -0.015009105 -0.075179466 -0.024394999 
         S13          S14          S15          S16          S17          S18 
-0.015877701 -0.068399381 -0.062456017 -0.013593386 -0.073614568 -0.019517811 
         S19          S20          S21          S22          S23          S24 
-0.061584208 -0.023647400 -0.054074282 -0.181271604 -0.068055553  0.974284963 
         S25          S26          S27          S28          S29          S30 
-0.145329437 -0.002767212 -0.012294228 -0.070628884 -0.030155590 -0.060535354 
         S31          S32          S33          S34          S35          S36 
-0.037216658 -0.108211542 -0.036040608 -0.041366475 -0.047501475 -0.059594706 
         S37          S38          S39          S40          S41          S42 
-0.023044785 -0.009809251 -0.041996919 -0.076698374 -0.047202859 -0.010578202 
         S43          S44          S45          S46          S47          S48 
-0.018284108 -0.063786098 -0.019673563 -0.012044180 -0.007368971 -0.011730486 
         S49          S50          S51          S52          S53          S54 
-0.067056953 -0.047695312 -0.044867274 -0.034636363 -0.010994595 -0.037609532 
         S55          S56          S57          S58          S59          S60 
-0.088949801 -0.014599940 -0.016529995 -0.072410744 -0.074775556 -0.033188920 
         S61          S62          S63          S64          S65          S66 
-0.061430424 -0.120540929  0.761479301 -0.022917240 -0.051051866 -0.041706933 
         S67          S68          S69          S70          S71          S72 
-0.006453105 -0.041027806 -0.030991764 -0.023044875 -0.006887752 -0.011297784 
         S73          S74          S75          S76          S77          S78 
-0.028909237 -0.021715294 -0.049836443 -0.046737449 -0.055060887 -0.018219571 
         S79          S80          S81          S82          S83          S84 
-0.032564091  0.956459436 -0.111841011 -0.045706689 -0.050893870  0.859442253 
         S85          S86          S87          S88          S89          S90 
-0.015956008 -0.231147686 -0.024650699 -0.016316900 -0.026612112 -0.048460284 
         S91          S92          S93          S94          S95          S96 
-0.053143941 -0.107022942 -0.037599987 -0.063419966 -0.046021957 -0.019995950 
         S97          S98          S99         S100         S101         S102 
-0.042263649 -0.014369391 -0.046402844 -0.008416013 -0.017703781 -0.005619499 
        S103         S104         S105         S106         S107         S108 
-0.045175634 -0.056641195 -0.116579909 -0.108218214 -0.037370474 -0.078280479 
        S109         S110         S111         S112         S113         S114 
-0.132643458 -0.018078647 -0.021009766 -0.039629281 -0.016528544 -0.052219716 
        S115         S116         S117         S118         S119         S120 
-0.006765629 -0.009588772 -0.058053275 -0.051906034 -0.028301556 -0.059163220 
        S121         S122         S123         S124         S125         S126 
-0.049515752 -0.025901288 -0.184056705 -0.076222485 -0.209815897 -0.081767302 
        S127         S128         S129         S130         S131         S132 
 0.912128335 -0.008290177 -0.008097736 -0.092634082 -0.039561192 -0.041385348 
        S133         S134         S135         S136         S137         S138 
-0.018010232 -0.033250010 -0.023171433 -0.028754265 -0.029858263 -0.024295099 
        S139         S140         S141         S142         S143         S144 
-0.014854808  0.919850561 -0.006812453 -0.002898144 -0.027013842 -0.008176772 
        S145         S146         S147         S148         S149         S150 
-0.043938984 -0.012700882 -0.024753712 -0.033925675 -0.033346675 -0.003235956 
        S151         S152         S153         S154         S155         S156 
-0.148707122 -0.179165800 -0.089788756 -0.042816950 -0.009812559  0.919239709 
        S157         S158         S159         S160         S161         S162 
-0.151319500 -0.015716955 -0.003314139 -0.037802951 -0.018657507 -0.022116279 
        S163         S164         S165         S166         S167         S168 
-0.018432145 -0.074466331 -0.093264289 -0.010903914 -0.206425530 -0.043651379 
        S169         S170         S171         S172         S173         S174 
-0.089984962 -0.044163947 -0.041576509 -0.079298365 -0.069332269 -0.088450214 
        S175         S176         S177         S178         S179         S180 
-0.036839484  0.977610250 -0.095886768 -0.013925731 -0.048070097 -0.059416539 
        S181         S182         S183         S184         S185         S186 
-0.055210518 -0.050896853 -0.023679327 -0.129357090  0.848714795 -0.019374720 
        S187         S188         S189         S190         S191         S192 
-0.040122393 -0.038717520 -0.013846786 -0.007357227 -0.023353975 -0.050335106 
        S193         S194         S195         S196         S197         S198 
-0.060132648 -0.013423302  0.964332121 -0.023076775 -0.030909077 -0.004503077 
        S199         S200 
-0.034675568 -0.024282568 
> 
> ## "train" simple Bernoulli model on a subset of 100 samples
> model <- nullModel(y ~ 0, pheno[1:100, ])
> model
Simple Bernoulli model:
	Raw phenotypes (no covariates, no intercept)
	Number of samples: 100
	Number of positives (cases): 4 
	No resampling
> length(model)
[1] 100
> residuals(model)
  S1   S2   S3   S4   S5   S6   S7   S8   S9  S10  S11  S12  S13  S14  S15  S16 
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
 S17  S18  S19  S20  S21  S22  S23  S24  S25  S26  S27  S28  S29  S30  S31  S32 
   0    0    0    0    0    0    0    1    0    0    0    0    0    0    0    0 
 S33  S34  S35  S36  S37  S38  S39  S40  S41  S42  S43  S44  S45  S46  S47  S48 
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0 
 S49  S50  S51  S52  S53  S54  S55  S56  S57  S58  S59  S60  S61  S62  S63  S64 
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    1    0 
 S65  S66  S67  S68  S69  S70  S71  S72  S73  S74  S75  S76  S77  S78  S79  S80 
   0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    1 
 S81  S82  S83  S84  S85  S86  S87  S88  S89  S90  S91  S92  S93  S94  S95  S96 
   0    0    0    1    0    0    0    0    0    0    0    0    0    0    0    0 
 S97  S98  S99 S100 
   0    0    0    0 
> 
> ## alternatively, use the interface that only supplies the
> ## trait vector
> model <- nullModel(y=pheno[1:100, ]$y)
> model
Simple Bernoulli model:
	Raw phenotypes (no covariates, no intercept)
	Number of samples: 100
	Number of positives (cases): 4 
	No resampling
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>