Four-timepoint longitudinal data generated from an arbitrary Monte Carlo simulation, for 1000 simulees. The response variable is a discrete count variable. There are three time-invariant covariates. The data are available in both "wide" and "long" format.
Usage
data("LongitudinalOverdispersedCounts")
Format
The "long" format dataframe, longData, has 4000 rows and the following variables (columns):
id: Factor; simulee ID code.
tiem: Numeric; represents the time metric, wave of assessment.
x1: Numeric; time-invariant covariate.
x2: Numeric; time-invariant covariate.
x3: Numeric; time-invariant covariate.
y: Numeric; the response ("dependent") variable.
The "wide" format dataset, wideData, is a numeric 1000x12 matrix containing the following variables (columns):
id: Simulee ID code.
x1: Time-invariant covariate.
x3: Time-invariant covariate.
x3: Time-invariant covariate.
y0: Response at initial wave of assessment.
y1: Response at first follow-up.
y2: Response at second follow-up.
y3: Response at third follow-up.
t0: Time variable at initial wave of assessment (in this case, 0).
t1: Time variable at first follow-up (in this case, 1).
t2: Time variable at second follow-up (in this case, 2).
t3: Time variable at third follow-up (in this case, 3).
Examples
data(LongitudinalOverdispersedCounts)
head(wideData)
str(longData)
#Let's try ordinary least-squares (OLS) regression:
olsmod <- lm(y~tiem+x1+x2+x3, data=longData)
#We will see in the diagnostic plots that the residuals are poorly approximated by normality,
#and are heteroskedastic. We also know that the residuals are not independent of one another,
#because we have repeated-measures data:
plot(olsmod)
#In the summary, it looks like all of the regression coefficients are significantly different
#from zero, but we know that because the assumptions of OLS regression are violated that
#we should not trust its results:
summary(olsmod)
#Let's try a generalized linear model (GLM). We'll use the quasi-Poisson quasilikelihood
#function to see how well the y variable is approximated by a Poisson distribution
#(conditional on time and covariates):
glm.mod <- glm(y~tiem+x1+x2+x3, data=longData, family="quasipoisson")
#The estimate of the dispersion parameter should be about 1.0 if the data are
#conditionally Poisson. We can see that it is actually greater than 2,
#indicating overdispersion:
summary(glm.mod)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(OpenMx)
Loading required package: digest
Loading required package: MASS
Loading required package: Matrix
Loading required package: Rcpp
Loading required package: parallel
Attaching package: 'OpenMx'
The following objects are masked from 'package:Matrix':
%&%, expm
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/OpenMx/LongitudinalOverdispersedCounts.Rd_%03d_medium.png", width=480, height=480)
> ### Name: LongitudinalOverdispersedCounts
> ### Title: Longitudinal, Overdispersed Count Data
> ### Aliases: LongitudinalOverdispersedCounts longData wideData
> ### Keywords: datasets
>
> ### ** Examples
>
> data(LongitudinalOverdispersedCounts)
> head(wideData)
id x1 x2 x3 y0 y1 y2 y3 t0 t1 t2 t3
[1,] 1 0.09028680 -0.70454619 0.98179355 1 4 4 13 0 1 2 3
[2,] 2 -0.60569794 1.84021070 0.34143632 2 3 24 23 0 1 2 3
[3,] 3 -1.64132905 0.06420197 0.18268172 0 3 3 9 0 1 2 3
[4,] 4 -0.94034250 0.13452838 1.41092610 2 2 2 17 0 1 2 3
[5,] 5 -0.08902176 -0.64903624 0.08836685 1 12 6 23 0 1 2 3
[6,] 6 -1.61535407 0.99948904 0.03628061 1 5 4 15 0 1 2 3
> str(longData)
'data.frame': 4000 obs. of 6 variables:
$ id : Factor w/ 1000 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...
$ tiem: num 0 1 2 3 0 1 2 3 0 1 ...
$ x1 : num 0.0903 0.0903 0.0903 0.0903 -0.6057 ...
$ x2 : num -0.705 -0.705 -0.705 -0.705 1.84 ...
$ x3 : num 0.982 0.982 0.982 0.982 0.341 ...
$ y : num 1 4 4 13 2 3 24 23 0 3 ...
> #Let's try ordinary least-squares (OLS) regression:
> olsmod <- lm(y~tiem+x1+x2+x3, data=longData)
> #We will see in the diagnostic plots that the residuals are poorly approximated by normality,
> #and are heteroskedastic. We also know that the residuals are not independent of one another,
> #because we have repeated-measures data:
> plot(olsmod)
> #In the summary, it looks like all of the regression coefficients are significantly different
> #from zero, but we know that because the assumptions of OLS regression are violated that
> #we should not trust its results:
> summary(olsmod)
Call:
lm(formula = y ~ tiem + x1 + x2 + x3, data = longData)
Residuals:
Min 1Q Median 3Q Max
-15.566 -4.507 -0.873 3.405 55.311
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.4667 0.1802 -2.590 0.00963 **
tiem 6.8208 0.0968 70.461 < 2e-16 ***
x1 2.9791 0.1141 26.100 < 2e-16 ***
x2 1.8477 0.1153 16.029 < 2e-16 ***
x3 -1.0792 0.1109 -9.731 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.793 on 3935 degrees of freedom
(60 observations deleted due to missingness)
Multiple R-squared: 0.6238, Adjusted R-squared: 0.6234
F-statistic: 1631 on 4 and 3935 DF, p-value: < 2.2e-16
>
> #Let's try a generalized linear model (GLM). We'll use the quasi-Poisson quasilikelihood
> #function to see how well the y variable is approximated by a Poisson distribution
> #(conditional on time and covariates):
> glm.mod <- glm(y~tiem+x1+x2+x3, data=longData, family="quasipoisson")
> #The estimate of the dispersion parameter should be about 1.0 if the data are
> #conditionally Poisson. We can see that it is actually greater than 2,
> #indicating overdispersion:
> summary(glm.mod)
Call:
glm(formula = y ~ tiem + x1 + x2 + x3, family = "quasipoisson",
data = longData)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.8007 -1.1976 -0.2377 0.7545 4.8464
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.515972 0.022339 23.10 <2e-16 ***
tiem 0.840302 0.008741 96.14 <2e-16 ***
x1 0.305381 0.007884 38.73 <2e-16 ***
x2 0.194600 0.008152 23.87 <2e-16 ***
x3 -0.111168 0.007792 -14.27 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.157702)
Null deviance: 41602.8 on 3939 degrees of freedom
Residual deviance: 8293.9 on 3935 degrees of freedom
(60 observations deleted due to missingness)
AIC: NA
Number of Fisher Scoring iterations: 5
>
>
>
>
>
> dev.off()
null device
1
>