trioSim() simulates parental genotypes, child genotypes,
environmental attribute and sub-population membership on affected trios with
informative mating types
from a stratified population. All genotypes are at a test locus that is linked to a causal locus.
A desired number of informative case-parent trios to simulate.
popfs
A vector of sub-population frequencies whose length is equal to the number of sub-populations.
hapfs
A list comprised of vectors of haplotype frequencies.
One haplotype frequency vector for each sub-population.
See Details for the assumed order of haplotypes.
edists
A list comprised of functions to simulate the environment attribute.
One simulation function for each sub-population.
recomb
Recombination frequency between the test and causal locus.
Currently not implemented.
The function will stop execution if a non-zero value is specified.
riskmod
A function to evaluate the risk (probability) of disease.
The function should take two arguments. The first is the child's genotype,
and the second is the environmental attribute.
batchsize
Size of the batches of trios to simulate. See Details for more information.
Details
The function simulates trios from a stratified population.
Population stratification is controlled by the user's choice of
sub-population sizes, haplotype frequencies in each sub-population and the
distribution of the environmental attribute in each sub-population.
Given sub-population sizes, the degree of population stratification increases with
greater differences in the distributions of the haplotype frequency and the
environmental attribute among sub-populations.
The function first simulates sub-population membership for each trio using
the sub-population frequencies supplied by the user in the argument popfs.
Conditional on sub-population, parental haplotypes Hp are simulated assuming
Hardy-Weinberg proportions using the subpopulation-specific haplotype frequencies
in the argument hapfs.
Haplotype frequencies should be in the order N0, N1, R0, R1, where N and R denote
the non-risk and risk alleles at the causal locus, and 0 and 1 denote the non-index
and index alleles at the test locus.
To save computation time, we only considered informative parental mating types
by simulating one parent from the conditional distribution given that the parent
is heterozygous and simulating the other parent without any restrictions.
Conditional on parental haplotypes, child haplotypes are sampled according to Mendel's laws.
From the sampled haplotypes of the parents and children, their genotypes for the causal
and test loci are extracted.
Assuming conditional independence between the gene and the environmental attribute given
sub-population, the environmental attribute for each trio is simulated conditional on
sub-population using the subpopulation-specific simulation functions in the argument edists.
Finally, disease status is simulated according to the risk model in the argument riskmod;
only those trios with affected children are retained.
To speed up computation, the rejection sampling of trios is done in batches of size 'batchsize'
until a desired number of affected trios is obtained.
In simulation studies we have performed, choosing batchsize on the order of 1/3 the
desired number of trios appeared to be the fastest.
Value
A data frame with columns
parent1
Test locus genotypes for one parent (heterozygous) coded as 0, 1, 2
representing the number of copies of the index allele.
parent2
Test locus genotype for the other parent.
child
Test locus genotypes for the child.
subpop
Sub-population membership for the trio.
Sub-populations are numbered 0,1,...,k-1, where k is the number of sub-populations.
attr
The environmental attribute.
Author(s)
Ji-Hyung Shin <shin@sfu.ca>,
Brad McNeney <mcneney@sfu.ca>,
Jinko Graham <jgraham@sfu.ca>
See Also
trioGxE, plot.trioGxE, test.trioGxE
Examples
# Generate case-parent trio from a population composed of
# two equal sized subpopulations.
# Set up list of functions to sample from each E distribution
e1<-function(n) {
return(rnorm(n,mean=(-0.8),sd=sqrt(1-.8^2)))
}
e2<-function(n) {
return(rnorm(n,mean=(0.8),sd=sqrt(1-.8^2)))
}
# Set up haplotype frequency distributions in the two subpopulations:
# The first subpopulation has the risk allele frequency of 0.1, where as
# the second subpopulation's frequency is 0.9.
# Set up risk model function.
## Simulate informative case-parent trios under additive linear GxE with a negative slope
riskmod<-function(G,E) {
n<-length(G)
# Baseline risk. Affects disease prevalence.
# The higher the prevalence, the less time wasted
# rejecting unaffected trios.
k<-(-2)
betaG<-log(3)/2
# Interaction
betaGE<-(-0.1)
# quadratic GxE
rr<-exp(k+betaG*G + betaGE*G*E)
rr[rr>1]<-1 # It is up to the user to make sure there are
# no probabilities greater than one.
D<-rbinom(n=n,size=1,prob=rr)
return(D)
}
# Simulate trio data under haplotype-environment dependence
# when marker locus is causal locus.
# allele frequency in subpop 0 is 0.1, allele frequency in subpop 1 is 0.9.
hapf1=c(0.9, 0, 0, 0.1)
hapf2=c(0.1, 0, 0, 0.9)
simdat.HEdep<-trioSim(n=3000,popfs=c(0.5,0.5),riskmod=riskmod,
edists=list(e1,e2),hapfs=list(hapf1,hapf2),
recomb=0,batchsize=1000)
# Simulate trio data under haplotype-environment independence
# when marker locus is causal locus.
# allele frequency in subpop 0 and subpop 1 is 0.1.
hapf1=hapf2=c(0.9, 0, 0, 0.1)
simdat.HEindep<-trioSim(n=3000,popfs=c(0.5,0.5),riskmod=riskmod,
edists=list(e1,e2),hapfs=list(hapf1,hapf2),
recomb=0,batchsize=1000)