With the exception of calling Multiplyr to create a new data frame, none
of the methods/fields here are really intended for general use: it's
generally best to stick to the manipulation functions. Run the following command
to get a better overview: vignette("basics")
Arguments
...
Either a data frame or a list of name=value pairs
cl
Cluster object, number of nodes or NULL (default)
alloc
Allocate additional columns
auto_compact
Automatically compact data after filter operations
auto_partition
Automatically re-partition after group_by
profiling
Enable internal profiling code
Value
Object of class Multiplyr
Fields
auto_compact
Compact data after each filtering etc. operation
auto_partition
Re-partition after group_by
bindenv
Environment for within_group etc. operations
bm
big.matrix (internal representation of data)
bm.master
big.matrix for certain operations that need non-subsetted data
cls
SOCKcluster created by parallel package
col.names
Name of each column; names starting "." are special and NA is a free column
desc.master
big.matrix.descriptor for setting up shared memory access
empty
Flag indicating that this data frame is empty
factor.cols
Which columns are factors/character
factor.levels
List (same length as factor.cols) containing corresponding factor levels
filtercol
Which column in bm indicates filtering (1=included, 0=excluded)
filtered
Flag indicating that this data frame has had filtering applied
first
Subsetting: first row
group.cols
Which columns are involved in grouping
groupcol
Which column in bm contains the group ID
grouped
Flag indicating whether grouped
groupenv
List of environments corresponding to group IDs in group
group_max
Number of groups
group_partition
Flag indicating that partition_group() has been used
group_sizes_stale
Flag indicating that group sizes need to be re-calculated
group
Which group IDs are assigned to this data frame
last
Subsetting: last row
nsamode
Flag indicating whether data frame is in no-strings-attached mode
order.cols
Display order of columns
pad
Number of spaces to pad each column or 0 for dynamic
profile_names
Profile names
profile_real
Total elapsed time for each profile
profile_rreal
Reference time for total elapsed
profile_rsys
Reference time for system
profile_ruser
Reference time for user
profile_sys
Total system time for each profile
profile_user
Total user time for each profile
profiling
Flag indicating that profiling is to be used
slave
Flag indicating whether cluster_* operations are valid
tmpcol
Which column may be used for temporary calculations
type.cols
Column type (0=numeric, 1=character, 2=factor)
Methods
alloc_col(name = ".tmp", update = FALSE)
Allocate a new column and optionally update cluster nodes to do the same. Returns the column number
build_grouped()
Build group environments
calc_group_sizes(delay = TRUE)
Calculate group sizes (if delay=TRUE then this will just mark group sizes as being stale)
Retrieve given rows (i), columns (j). drop=TRUE with 1 column will return a vector, otherwise a standard data.frame. If no strings attached mode is enabled, this will only return a vector or a matrix
group_cache_attach(descres)
Attach data frame to group_cache
group_restrict(grpid = NULL)
Restricts data to only specified group ID. If NULL, returns to non-restricted.
Sorts data by specified (numeric) columns or by translating from a lazy_dots object. with.group is used to ensure that the sort is by grouping columns first to ensure contiguity
submatrix(a, b)
Returns a sub.big.matrix between specified rows (a:b)
update_fields(fieldnames)
Update specified cluster data frames' field names to be the same as this one's
Examples
dat <- Multiplyr (x=1:100, G=rep(c("A", "B"), each=50), cl=2)
dat %>% shutdown()
dat.df <- data.frame (x=1:100, G=rep(c("A", "B"), each=50))
dat <- Multiplyr (dat.df, cl=2)
dat %>% shutdown()