Last data update: 2014.03.03

R: Synthetic dataset due to Hankin
sweetsR Documentation

Synthetic dataset due to Hankin

Description

Four objects:

  • sweets is a 2x3x21 array

  • sweets_tally is a length 37 vector

  • sweets_array is a 2x3x37 vector

  • sweets_table is a 37x6 matrix

Usage

data(sweets)

Details

Object sweets is the raw dataset; objects sweets_table and sweets_tally are processed versions which are easier to analyze.

The father of a certain family brings home nine sweets of type mm and nine sweets of type jb each day for 21 days to his children, AMH, ZJH, and AGH.

The children share the sweets amongst themselves in such a way that each child receives exactly 6 sweets.

  • Array sweets has dimension c(2,3,21): 2 types of sweets, 3 children, and 21 days. Thus sweets[,,1] shows that on the first day, AMH chose 0 sweets of type mm and 6 sweets of type jb; child ZJH chose 3 of each, and child AGH chose 6 sweets of type mm and 0 sweets of type jb.

    Observe the constant marginal totals: the kids have the same overall number of sweets each, and there are a fixed number of each kind of sweet.

  • Array sweets_array has dimension c(2,3,37): 2 sweets, 3 children, and 37 possible ways of arranging a matrix with the specified marginal totals. This can be produced by allboards() of the aylmer package.

  • sweets_table is a dataframe with six columns, one for each combination of child and sweet, and 37 rows, each row showing a permissible arrangement. All possibilities are present. The six entries of sweets[,,1] correspond to the six elements of sweets_table[1,]; the column names are mnemonics.

  • sweets_tally shows how often each of the arrangements in sweets_tally was observed (that is, it's a table of the 21 observations in sweets)

Source

The Hankin family

Examples

data(sweets)

# show correspondence between sweets_table and sweets_tally:
cbind(sweets_table, sweets_tally)

# Sum the data, by sweet and child and test:
fisher.test(apply(sweets,1:2,sum))
# Not significant!




# Now test for overdispersion.
# First set up the regressors:

jj1 <- apply(sweets_array,3,tcrossprod)
jj2 <- apply(sweets_array,3, crossprod)
dim(jj1) <- c(2,2,37)
dim(jj2) <- c(3,3,37)

theta_xy <- jj1[1,2,]
  phi_ab <- jj2[1,2,]
  phi_ac <- jj2[1,3,]
  phi_bc <- jj2[2,3,]

# Now the offset:
Off <- apply(sweets_array,3,function(x){-sum(lfactorial(x))})

# Now the formula:
f <- formula(sweets_tally~ -1 + theta_xy + phi_ab + phi_ac + phi_bc)

# Now the Lindsey Poisson device:
out <- glm(formula=f, offset=Off, family=poisson)

summary(out)
#  See how the residual deviance is comparable with the degrees of freedom  

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(MM)
Loading required package: magic
Loading required package: abind
Loading required package: partitions
Loading required package: emulator
Loading required package: mvtnorm
Loading required package: Oarray
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/MM/sweets.Rd_%03d_medium.png", width=480, height=480)
> ### Name: sweets
> ### Title: Synthetic dataset due to Hankin
> ### Aliases: sweets sweets_tally sweets_table sweets_array
> ### Keywords: datasets
> 
> ### ** Examples
> 
> data(sweets)
> 
> # show correspondence between sweets_table and sweets_tally:
> cbind(sweets_table, sweets_tally)
   AMH_mm AMH_jb ZJH_mm ZJH_jb AGH_mm AGH_jb sweets_tally
1       0      6      3      3      6      0            1
2       0      6      4      2      5      1            0
3       0      6      5      1      4      2            0
4       0      6      6      0      3      3            2
5       1      5      2      4      6      0            1
6       1      5      3      3      5      1            0
7       1      5      4      2      4      2            0
8       1      5      5      1      3      3            1
9       1      5      6      0      2      4            0
10      2      4      1      5      6      0            1
11      2      4      2      4      5      1            0
12      2      4      3      3      4      2            0
13      2      4      4      2      3      3            0
14      2      4      5      1      2      4            0
15      2      4      6      0      1      5            0
16      3      3      0      6      6      0            3
17      3      3      1      5      5      1            0
18      3      3      2      4      4      2            0
19      3      3      3      3      3      3            0
20      3      3      4      2      2      4            0
21      3      3      5      1      1      5            0
22      3      3      6      0      0      6            4
23      4      2      0      6      5      1            0
24      4      2      1      5      4      2            0
25      4      2      2      4      3      3            0
26      4      2      3      3      2      4            0
27      4      2      4      2      1      5            0
28      4      2      5      1      0      6            0
29      5      1      0      6      4      2            0
30      5      1      1      5      3      3            1
31      5      1      2      4      2      4            1
32      5      1      3      3      1      5            0
33      5      1      4      2      0      6            0
34      6      0      0      6      3      3            3
35      6      0      1      5      2      4            0
36      6      0      2      4      1      5            0
37      6      0      3      3      0      6            3
> 
> # Sum the data, by sweet and child and test:
> fisher.test(apply(sweets,1:2,sum))

	Fisher's Exact Test for Count Data

data:  apply(sweets, 1:2, sum)
p-value = 0.2408
alternative hypothesis: two.sided

> # Not significant!
> 
> 
> 
> 
> # Now test for overdispersion.
> # First set up the regressors:
> 
> jj1 <- apply(sweets_array,3,tcrossprod)
> jj2 <- apply(sweets_array,3, crossprod)
> dim(jj1) <- c(2,2,37)
> dim(jj2) <- c(3,3,37)
> 
> theta_xy <- jj1[1,2,]
>   phi_ab <- jj2[1,2,]
>   phi_ac <- jj2[1,3,]
>   phi_bc <- jj2[2,3,]
> 
> # Now the offset:
> Off <- apply(sweets_array,3,function(x){-sum(lfactorial(x))})
> 
> # Now the formula:
> f <- formula(sweets_tally~ -1 + theta_xy + phi_ab + phi_ac + phi_bc)
> 
> # Now the Lindsey Poisson device:
> out <- glm(formula=f, offset=Off, family=poisson)
> 
> summary(out)

Call:
glm(formula = f, family = poisson, offset = Off)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2322  -1.0071  -0.3001   0.1001   2.1276  

Coefficients:
         Estimate Std. Error z value Pr(>|z|)    
theta_xy -1.54215    0.11374  -13.56   <2e-16 ***
phi_ab    0.87572    0.03835   22.83   <2e-16 ***
phi_ac    0.88047    0.03804   23.14   <2e-16 ***
phi_bc    0.85525    0.03988   21.44   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 666.028  on 37  degrees of freedom
Residual deviance:  27.152  on 33  degrees of freedom
AIC: 62.007

Number of Fisher Scoring iterations: 6

> #  See how the residual deviance is comparable with the degrees of freedom  
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>