split divides the data in the vector x into the groups
defined by f. The replacement forms replace values
corresponding to such a division. unsplit reverses the effect of
split.
Usage
split(x, f, drop = FALSE, ...)
split(x, f, drop = FALSE, ...) <- value
unsplit(value, f, drop = FALSE)
Arguments
x
vector or data frame containing values to be divided into groups.
f
a ‘factor’ in the sense that as.factor(f)
defines the grouping, or a list of such factors in which case their
interaction is used for the grouping.
drop
logical indicating if levels that do not occur should be dropped
(if f is a factor or a list).
value
a list of vectors or data frames compatible with a
splitting of x. Recycling applies if the lengths do not match.
...
further potential arguments passed to methods.
Details
split and split<- are generic functions with default and
data.frame methods. The data frame method can also be used to
split a matrix into a list of matrices, and the replacement form
likewise, provided they are invoked explicitly.
unsplit works with lists of vectors or data frames (assumed to
have compatible structure, as if created by split). It puts
elements or rows back in the positions given by f. In the data
frame case, row names are obtained by unsplitting the row name
vectors from the elements of value.
f is recycled as necessary and if the length of x is not
a multiple of the length of f a warning is printed.
Any missing values in f are dropped together with the
corresponding values of x.
The default method calls interaction. If the levels of
the factors contain . they may not be split as expected, so
the method has argument sep which is use to join the levels.
Value
The value returned from split is a list of vectors containing
the values for the groups. The components of the list are named by
the levels of f (after converting to a factor, or if already a
factor and drop = TRUE, dropping unused levels).
The replacement forms return their right hand side. unsplit
returns a vector or data frame for which split(x, f) equals
value
References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole.
See Also
cut to categorize numeric values.
strsplit to split strings.
Examples
require(stats); require(graphics)
n <- 10; nn <- 100
g <- factor(round(n * runif(n * nn)))
x <- rnorm(n * nn) + sqrt(as.numeric(g))
xg <- split(x, g)
boxplot(xg, col = "lavender", notch = TRUE, varwidth = TRUE)
sapply(xg, length)
sapply(xg, mean)
### Calculate 'z-scores' by group (standardize to mean zero, variance one)
z <- unsplit(lapply(split(x, g), scale), g)
# or
zz <- x
split(zz, g) <- lapply(split(x, g), scale)
# and check that the within-group std dev is indeed one
tapply(z, g, sd)
tapply(zz, g, sd)
### data frame variation
## Notice that assignment form is not used since a variable is being added
g <- airquality$Month
l <- split(airquality, g)
l <- lapply(l, transform, Oz.Z = scale(Ozone))
aq2 <- unsplit(l, g)
head(aq2)
with(aq2, tapply(Oz.Z, Month, sd, na.rm = TRUE))
### Split a matrix into a list by columns
ma <- cbind(x = 1:10, y = (-4:5)^2)
split(ma, col(ma))
split(1:10, 1:2)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(base)
> png(filename="/home/ddbj/snapshot/RGM3/R_rel/result/base/split.Rd_%03d_medium.png", width=480, height=480)
> ### Name: split
> ### Title: Divide into Groups and Reassemble
> ### Aliases: split split.default split.data.frame split<- split<-.default
> ### split<-.data.frame unsplit
> ### Keywords: category
>
> ### ** Examples
>
> require(stats); require(graphics)
> n <- 10; nn <- 100
> g <- factor(round(n * runif(n * nn)))
> x <- rnorm(n * nn) + sqrt(as.numeric(g))
> xg <- split(x, g)
> boxplot(xg, col = "lavender", notch = TRUE, varwidth = TRUE)
> sapply(xg, length)
0 1 2 3 4 5 6 7 8 9 10
43 98 87 101 107 86 126 106 99 103 44
> sapply(xg, mean)
0 1 2 3 4 5 6 7
1.119855 1.445240 1.579890 1.958273 2.269893 2.630788 2.544226 2.959390
8 9 10
3.089318 3.275701 3.492476
>
> ### Calculate 'z-scores' by group (standardize to mean zero, variance one)
> z <- unsplit(lapply(split(x, g), scale), g)
>
> # or
>
> zz <- x
> split(zz, g) <- lapply(split(x, g), scale)
>
> # and check that the within-group std dev is indeed one
> tapply(z, g, sd)
0 1 2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1 1 1
> tapply(zz, g, sd)
0 1 2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1 1 1
>
>
> ### data frame variation
>
> ## Notice that assignment form is not used since a variable is being added
>
> g <- airquality$Month
> l <- split(airquality, g)
> l <- lapply(l, transform, Oz.Z = scale(Ozone))
> aq2 <- unsplit(l, g)
> head(aq2)
Ozone Solar.R Wind Temp Month Day Oz.Z
1 41 190 7.4 67 5 1 0.7822293
2 36 118 8.0 72 5 2 0.5572518
3 12 149 12.6 74 5 3 -0.5226399
4 18 313 11.5 62 5 4 -0.2526670
5 NA NA 14.3 56 5 5 NA
6 28 NA 14.9 66 5 6 0.1972879
> with(aq2, tapply(Oz.Z, Month, sd, na.rm = TRUE))
5 6 7 8 9
1 1 1 1 1
>
>
> ### Split a matrix into a list by columns
> ma <- cbind(x = 1:10, y = (-4:5)^2)
> split(ma, col(ma))
$`1`
[1] 1 2 3 4 5 6 7 8 9 10
$`2`
[1] 16 9 4 1 0 1 4 9 16 25
>
> split(1:10, 1:2)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
>
>
>
>
>
> dev.off()
null device
1
>