Last data update: 2014.03.03

R: RVPedigree main function
RVPedigreeR Documentation

RVPedigree main function

Description

Main function of the RVPedigree package

Usage

RVPedigree(method = "ASKAT", y = NULL, X = NULL, Phi = NULL,
  filename = NULL, type = "bed", regions = NULL, weights = NULL,
  Nperm = 100, pvalThreshold = 0.1, VCC3afterVCC1 = FALSE, Ncores = 1)

Arguments

method

character, selects the method to use for the association testing. Can be one of the following:

  • "ASKAT" (default)

  • "NASKAT", normalized ASKAT

  • "VCC1", VC-C1

  • "VCC2", VC-C2

  • "VCC3", VC-C3

y

vector of phenotype data (one entry per individual), of length n.

X

matrix of covariates including intercept (dimension: n \times p, with p the number of covariates)

Phi

Relationship matrix (i.e. twice the kinship matrix); an n \times n square symmetric positive-definite matrix.

filename

character, path to input file containing haplotype data

type

character, 'ped', 'bed' (default) or 'shapeit-haps' format of input file containing haplotype data

regions

a data frame with details of the genomic regions in which the association test specified by the method parameter should be run. The data frame should have one row per region and (at least) four columns with the following names:

  • Name: Name of the region (e.g. Gene 01)

  • Chr: Chromosome on which the region is located.

  • StartPos: The base pair coordinate at which the region starts

  • EndPos: The base pair coordinate at which the region ends.

Any other columns will be ignored.

weights

optional numeric vector of genotype weights. If this option is not specified, the beta distribution is used for weighting the variants, with each weight given by w_i = dbeta(f_i, 1, 25)^2, with f_i the minor allele frequency (MAF) of variant i. This default is the same as used by the SKAT package. This vector is used as the diagonal of the m \times m matrix W, with m the number of variants.

Nperm

(integer) The number of permutations to be done to calculate the empirical p-value if the VCC2 or VCC3 method is used. For other methods this parameter is ignored (default: 100).

pvalThreshold

(numeric) Threshold for the association p-value. Regions with a p-value below this threshold will not be present in the output data frame (default: 0.1).

VCC3afterVCC1

(logical) Boolean value that indicates whether the VC-C3 method should automatically be run on the variants passing the p-value threshold set using the pvalThreshold parameter (default: FALSE).

Ncores

(integer) Number of processor (CPU) cores to be used in parallel when doing running the association analysis. If the number of regions is larger than the number of cores, then each region gets to use maximum one core. If the number of cores is larger than the number of regions and the VCC2 or VCC3 methods are selected, the remaining cores are distributed among the regions to parallelize the permutations used to determine the p-value (default: 1).

Details

The RVPedigree function is the main function of the RVPedigree used package.

Under the hood this function calls ASKAT.region, NormalizedASKAT.region, VCC1.region, VCC2.region or VCC3.region, depending on the method parameter specified by the user.

Value

A data frame containing results of the association test specified by the method parameter for each region in the data frame specified by the regions parameter. The output data frame contains the following columns:

  • Score.Test: the score of the given association test

  • P.value: the p-value of the association test

  • N.Markers: the number of markers in the region

  • regionname: Name of the regions/genes on which you are running the association tests

Note that regions that do not contain any genetic variants will be removed from the output.

Author(s)

Lennart C. Karssen

Results