Most data operations are useful done on groups defined by variables in the
the dataset. The group_by function takes an existing tbl
and converts it into a grouped tbl where operations are performed
"by group".
variables to group by. All tbls accept variable names,
some will also accept functions of variables. Duplicated groups
will be silently dropped.
add
By default, when add = FALSE, group_by will
override existing groups. To instead add to the existing groups,
use add = TRUE
.dots
Used to work around non-standard evaluation. See
vignette("nse") for details.
Tbl types
group_by is an S3 generic with methods for the three built-in
tbls. See the help for the corresponding classes and their manip
methods for more details:
data.frame: grouped_df
data.table: grouped_dt
SQLite: src_sqlite
PostgreSQL: src_postgres
MySQL: src_mysql
See Also
ungroup for the inverse operation,
groups for accessors that don't do special evaluation.
Examples
by_cyl <- group_by(mtcars, cyl)
summarise(by_cyl, mean(disp), mean(hp))
filter(by_cyl, disp == max(disp))
# summarise peels off a single layer of grouping
by_vs_am <- group_by(mtcars, vs, am)
by_vs <- summarise(by_vs_am, n = n())
by_vs
summarise(by_vs, n = sum(n))
# use ungroup() to remove if not wanted
summarise(ungroup(by_vs), n = sum(n))
# You can group by expressions: this is just short-hand for
# a mutate/rename followed by a simple group_by
group_by(mtcars, vsam = vs + am)
group_by(mtcars, vs2 = vs)
# You can also group by a constant, but it's not very useful
group_by(mtcars, "vs")
# By default, group_by sets groups. Use add = TRUE to add groups
groups(group_by(by_cyl, vs, am))
groups(group_by(by_cyl, vs, am, add = TRUE))
# Duplicate groups are silently dropped
groups(group_by(by_cyl, cyl, cyl))