summaryP produces a tall and thin data frame containing
numerators (freq) and denominators (denom) after
stratifying the data by a series of variables. A special capability
to group a series of related yes/no variables is included through the
use of the ynbind function, for which the user specials a final
argument label used to label the panel created for that group
of related variables.
The plot method for summaryP displays proportions as a
multi-panel dot chart using the lattice package's dotplot
function with a special panel function. Numerators and
denominators of proportions are also included as text, in the same
colors as used by an optional groups variable. The
formula argument used in the dotplot call is constructed,
but the user can easily reorder the variables by specifying
formula, with elements named val (category levels),
var (classification variable name), freq (calculated
result) plus the overall cross-classification variables excluding
groups.
The ggplot method for summaryP does not draw numerators
and denominators but the chart is more compact because ggplot2
does not repeat category names the same way as lattice does.
Variable names that are too long to fit in panel strips are renamed
(1), (2), etc. and an attribute "fnvar" is added to the result;
this attribute is a character string defining the abbreviations,
useful in a figure caption.
The latex method produces one or more LaTeX tabulars
containing a table representation of the result, with optional
side-by-side display if groups is specified. Multiple
tabulars result from the presence of non-group stratification
factors.
Usage
summaryP(formula, data = NULL, subset = NULL,
na.action = na.retain, sort=TRUE,
asna = c("unknown", "unspecified"), ...)
## S3 method for class 'summaryP'
plot(x, formula, groups=NULL, exclude1=TRUE,
xlim = c(-.05, 1.05),
text.at=NULL, cex.values = 0.5,
key = list(columns = length(groupslevels), x = 0.75,
y = -0.04, cex = 0.9,
col = trellis.par.get('superpose.symbol')$col,
corner=c(0,1)),
outerlabels=TRUE, autoarrange=TRUE, ...)
## S3 method for class 'summaryP'
ggplot(data, mapping, groups=NULL, exclude1=TRUE,
xlim=c(0, 1), col=NULL, shape=NULL, size=function(n) n ^ (1/4),
sizerange=NULL, abblen=5, autoarrange=TRUE, addlayer=NULL,
..., environment)
## S3 method for class 'summaryP'
latex(object, groups=NULL, exclude1=TRUE, file='', round=3,
size=NULL, append=TRUE, ...)
Arguments
formula
a formula with the variables for whose levels
proportions are computed on the left hand side, and major
classification variables on the right. The formula need to include
any variable later used as groups, as the data summarization
does not distinguish between superpositioning and paneling. For the
plot method, formula can provide an overall to the default
formula for dotplot().
data
an optional data frame. For ggplot.summaryPdata is the result of summaryP.
subset
an optional subsetting expression or vector
na.action
function specifying how to handle NAs. The
default is to keep all NAs in the analysis frame.
sort
set to FALSE to not sort category levels in
descending order of global proportions
asna
character vector specifying level names to consider the
same as NA. Set asna=NULL to not consider any.
x
an object produced by summaryP
groups
a character string containing the name of a
superpositioning variable for obtaining
further stratification within a horizontal line in the dot chart.
exclude1
By default, ggplot, plot, and
latex methods for summaryP remove redundant entries
from tables for variables with only two levels. For example, if you
print the proportion of females, you don't need to print the
proportion of males. To override this, set exclude1=FALSE.
xlim
x-axis limits. Default is c(0,1).
text.at
specify to leave unused space to the right of each
panel to prevent numerators and denominators from touching data
points. text.at is the upper limit for scaling panels'
x-axes but tick marks are only labeled up to max(xlim).
cex.values
character size to use for plotting numerators and
denominators
key
a list to pass to the auto.key argument of
dotplot. To place a key above the entire chart use
auto.key=list(columns=2) for example.
outerlabels
by default if there are two conditioning variables
besides groups, the latticeExtra package's
useOuterStrips function is used to put strip labels in the
margins, usually resulting in a much prettier chart. Set to
FALSE to prevent usage of useOuterStrips.
autoarrange
If TRUE, the formula is re-arranged so that
if there are two conditioning (paneling) variables, the variable with
the most levels is taken as the vertical condition.
col
a vector of colors to use to override defaults in
ggplot
shape
a vector of plotting symbols to override ggplot
defaults
mapping, environment
not used; needed because of rules for generics
size
for ggplot, a function that transforms denominators
into metrics used for the size aesthetic. Default is the
fourth root function so that the area of symbols is proportional to
the square root of sample size. Specify NULL to not vary point
sizes. size=sqrt is a reasonable alternative. Set
size to an integer to categorize the denominators into
size quantile groups using cut2. Unless size is
an integer, the legend for sizes uses the minimum and maximum
denominators and 6-tiles using quantile(..., type=1) so that
actually occurring sample sizes are used as labels. size is
overridden to NULL if the range in denominators is less than 10
or the ratio of the maximum to the minimum is less than 1.2.
For latex, size is an optional font size such as
"small"
sizerange
a 2-vector specifying the range argument to the
ggplot2scale_size_... function, which is the
range of sizes allowed for the points according to the denominator.
The default is sizerange=c(.7, 3.25) but the lower limit is
increased according to the ratio of maximum to minimum sample sizes.
abblen
labels of variables having only one level and having
their name longer than abblen characters are
abbreviated and documented in fnvar (described elsewhere
here). The default abblen=5 is good for labels plotted
vertically. If labels are rotated using theme a better value
would be 12.
...
ignored
object
an object produced by summaryP
file
file name, defaults to writing to console
round
number of digits to the right of the decimal place for
proportions
append
set to FALSE to start output over
addlayer
a ggplot layer to add to the plot object
Value
summaryP produces a data frame of class
"summaryP". The plot method produces a lattice
object of class "trellis". The latex method produces an
object of class "latex" with an additional attribute
ngrouplevels specifying the number of levels of any
groups variable and an attribute nstrata specifying the
number of strata.
n <- 100
f <- function(na=FALSE) {
x <- sample(c('N', 'Y'), n, TRUE)
if(na) x[runif(100) < .1] <- NA
x
}
set.seed(1)
d <- data.frame(x1=f(), x2=f(), x3=f(), x4=f(), x5=f(), x6=f(), x7=f(TRUE),
age=rnorm(n, 50, 10),
race=sample(c('Asian', 'Black/AA', 'White'), n, TRUE),
sex=sample(c('Female', 'Male'), n, TRUE),
treat=sample(c('A', 'B'), n, TRUE),
region=sample(c('North America','Europe'), n, TRUE))
d <- upData(d, labels=c(x1='MI', x2='Stroke', x3='AKI', x4='Migraines',
x5='Pregnant', x6='Other event', x7='MD withdrawal',
race='Race', sex='Sex'))
dasna <- subset(d, region=='North America')
with(dasna, table(race, treat))
s <- summaryP(race + sex + ynbind(x1, x2, x3, x4, x5, x6, x7, label='Exclusions') ~
region + treat, data=d)
# add exclude1=FALSE below to include female category
plot(s, groups='treat')
ggplot(s, groups='treat')
plot(s, val ~ freq | region * var, groups='treat', outerlabels=FALSE)
# Much better looking if omit outerlabels=FALSE; see output at
# http://biostat.mc.vanderbilt.edu/HmiscNew#summaryP
# See more examples under bpplotM
# Make a chart where there is a block of variables that
# are only analyzed for males. Keep redundant sex in block for demo.
# Leave extra space for numerators, denominators
sb <- summaryP(race + sex +
pBlock(race, sex, label='Race: Males', subset=sex=='Male') ~
region, data=d)
plot(sb, text.at=1.3)
plot(sb, groups='region', layout=c(1,3), key=list(space='top'),
text.at=1.15)
ggplot(sb, groups='region')
## Not run:
plot(s, groups='treat')
# plot(s, groups='treat', outerlabels=FALSE) for standard lattice output
plot(s, groups='region', key=list(columns=2, space='bottom'))
colorFacet(ggplot(s))
plot(summaryP(race + sex ~ region, data=d), exclude1=FALSE, col='green')
# Make your own plot using data frame created by summaryP
useOuterStrips(dotplot(val ~ freq | region * var, groups=treat, data=s,
xlim=c(0,1), scales=list(y='free', rot=0), xlab='Fraction',
panel=function(x, y, subscripts, ...) {
denom <- s$denom[subscripts]
x <- x / denom
panel.dotplot(x=x, y=y, subscripts=subscripts, ...) }))
# Show marginal summary for all regions combined
s <- summaryP(race + sex ~ region, data=addMarginal(d, region))
plot(s, groups='region', key=list(space='top'), layout=c(1,2))
# Show marginal summaries for both race and sex
s <- summaryP(ynbind(x1, x2, x3, x4, label='Exclusions', sort=FALSE) ~
race + sex, data=addMarginal(d, race, sex))
plot(s, val ~ freq | sex*race)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Loading required package: ggplot2
Attaching package: 'Hmisc'
The following objects are masked from 'package:base':
format.pval, round.POSIXt, trunc.POSIXt, units
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/Hmisc/summaryP.Rd_%03d_medium.png", width=480, height=480)
> ### Name: summaryP
> ### Title: Multi-way Summary of Proportions
> ### Aliases: summaryP plot.summaryP ggplot.summaryP latex.summaryP
> ### Keywords: hplot category manip
>
> ### ** Examples
>
> n <- 100
> f <- function(na=FALSE) {
+ x <- sample(c('N', 'Y'), n, TRUE)
+ if(na) x[runif(100) < .1] <- NA
+ x
+ }
> set.seed(1)
> d <- data.frame(x1=f(), x2=f(), x3=f(), x4=f(), x5=f(), x6=f(), x7=f(TRUE),
+ age=rnorm(n, 50, 10),
+ race=sample(c('Asian', 'Black/AA', 'White'), n, TRUE),
+ sex=sample(c('Female', 'Male'), n, TRUE),
+ treat=sample(c('A', 'B'), n, TRUE),
+ region=sample(c('North America','Europe'), n, TRUE))
> d <- upData(d, labels=c(x1='MI', x2='Stroke', x3='AKI', x4='Migraines',
+ x5='Pregnant', x6='Other event', x7='MD withdrawal',
+ race='Race', sex='Sex'))
Input object size: 12352 bytes; 12 variables 100 observations
New object size: 14832 bytes; 12 variables 100 observations
> dasna <- subset(d, region=='North America')
> with(dasna, table(race, treat))
treat
race A B
Asian 8 8
Black/AA 8 13
White 8 7
> s <- summaryP(race + sex + ynbind(x1, x2, x3, x4, x5, x6, x7, label='Exclusions') ~
+ region + treat, data=d)
> # add exclude1=FALSE below to include female category
> plot(s, groups='treat')
> ggplot(s, groups='treat')
>
> plot(s, val ~ freq | region * var, groups='treat', outerlabels=FALSE)
> # Much better looking if omit outerlabels=FALSE; see output at
> # http://biostat.mc.vanderbilt.edu/HmiscNew#summaryP
> # See more examples under bpplotM
>
> # Make a chart where there is a block of variables that
> # are only analyzed for males. Keep redundant sex in block for demo.
> # Leave extra space for numerators, denominators
> sb <- summaryP(race + sex +
+ pBlock(race, sex, label='Race: Males', subset=sex=='Male') ~
+ region, data=d)
> plot(sb, text.at=1.3)
> plot(sb, groups='region', layout=c(1,3), key=list(space='top'),
+ text.at=1.15)
> ggplot(sb, groups='region')
> ## Not run:
> ##D plot(s, groups='treat')
> ##D # plot(s, groups='treat', outerlabels=FALSE) for standard lattice output
> ##D plot(s, groups='region', key=list(columns=2, space='bottom'))
> ##D colorFacet(ggplot(s))
> ##D
> ##D plot(summaryP(race + sex ~ region, data=d), exclude1=FALSE, col='green')
> ##D
> ##D # Make your own plot using data frame created by summaryP
> ##D useOuterStrips(dotplot(val ~ freq | region * var, groups=treat, data=s,
> ##D xlim=c(0,1), scales=list(y='free', rot=0), xlab='Fraction',
> ##D panel=function(x, y, subscripts, ...) {
> ##D denom <- s$denom[subscripts]
> ##D x <- x / denom
> ##D panel.dotplot(x=x, y=y, subscripts=subscripts, ...) }))
> ##D
> ##D # Show marginal summary for all regions combined
> ##D s <- summaryP(race + sex ~ region, data=addMarginal(d, region))
> ##D plot(s, groups='region', key=list(space='top'), layout=c(1,2))
> ##D
> ##D # Show marginal summaries for both race and sex
> ##D s <- summaryP(ynbind(x1, x2, x3, x4, label='Exclusions', sort=FALSE) ~
> ##D race + sex, data=addMarginal(d, race, sex))
> ##D plot(s, val ~ freq | sex*race)
> ## End(Not run)
>
>
>
>
>
> dev.off()
null device
1
>