Package 'bit' provides bitmapped vectors of booleans (no NAs),
coercion from and to logicals, integers and integer subscripts;
fast boolean operators and fast summary statistics.
With bit vectors you can store true binary booleans {FALSE,TRUE} at the expense
of 1 bit only, on a 32 bit architecture this means factor 32 less RAM and
factor 32 more speed on boolean operations. With this speed gain it even
pays-off to convert to bit in order to avoid a single boolean operation on
logicals or a single set operation on (longer) integer subscripts, the pay-off
is dramatic when such components are used more than once.
Reading from and writing to bit is approximately as fast as accessing standard
logicals - mostly due to R's time for memory allocation. The package allows to
work with pre-allocated memory for return values by calling .Call() directly:
when evaluating the speed of C-access with pre-allocated vector memory, coping
from bit to logical requires only 70% of the time for copying from logical to
logical; and copying from logical to bit comes at a performance penalty of 150%.
Since bit objects cannot be used as subsripts in R, a second class 'bitwhich'
allows to store selections as efficiently as possible with standard R types.
This is usefull either to represent parts of bit objects or to represent
very asymetric selections.
Class 'ri' (range index) allows to select ranges of positions for chunked processing:
all three classes 'bit', 'bitwhich' and 'ri' can be used for subsetting 'ff' objects (ff-2.1.0 and higher).
Usage
bit(length)
## S3 method for class 'bit'
print(x, ...)
Arguments
length
length of vector in bits
x
a bit vector
...
further arguments to print
Details
Package:
bit
Type:
Package
Version:
1.1.0
Date:
2012-06-05
License:
GPL-2
LazyLoad:
yes
Encoding:
latin1
Index:
bit function
bitwhich function
ri function
see also
description
.BITS
globalenv
variable holding number of bits on this system
bit_init
.First.lib
initially allocate bit-masks (done in .First.lib)
bit_done
.Last.lib
finally de-allocate bit-masks (done in .Last.lib)
bit
bitwhich
ri
logical
create bit object
print.bit
print.bitwhich
print.ri
print
print bit vector
length.bit
length.bitwhich
length.ri
length
get length of bit vector
length<-.bit
length<-.bitwhich
length<-
change length of bit vector
c.bit
c.bitwhich
c
concatenate bit vectors
is.bit
is.bitwhich
is.ri
is.logical
test for bit class
as.bit
as.bitwhich
as.logical
generically coerce to bit or bitwhich
as.bit.logical
as.bitwhich.logical
logical
coerce logical to bit vector (FALSE => FALSE, c(NA, TRUE) => TRUE)
as.bit.integer
as.bitwhich.integer
integer
coerce integer to bit vector (0 => FALSE, ELSE => TRUE)
as.bit.double
as.bitwhich.double
double
coerce double to bit vector (0 => FALSE, ELSE => TRUE)
as.double.bit
as.double.bitwhich
as.double.ri
as.double
coerce bit vector to double (0/1)
as.integer.bit
as.integer.bitwhich
as.integer.ri
as.integer
coerce bit vector to integer (0L/1L)
as.logical.bit
as.logical.bitwhich
as.logical.ri
as.logical
coerce bit vector to logical (FALSE/TRUE)
as.which.bit
as.which.bitwhich
as.which.ri
as.which
coerce bit vector to positive integer subscripts
as.bit.which
as.bitwhich.which
bitwhich
coerce integer subscripts to bit vector
as.bit.bitwhich
as.bitwhich.bitwhich
coerce from bitwhich
as.bit.bit
as.bitwhich.bit
UseMethod
coerce from bit
as.bit.ri
as.bitwhich.ri
coerce from range index
as.bit.ff
ff
coerce ff boolean to bit vector
as.ff.bit
as.ff
coerce bit vector to ff boolean
as.hi.bit
as.hi.bitwhich
as.hi.ri
as.hi
coerce to hybrid index (requires package ff)
as.bit.hi
as.bitwhich.hi
coerce from hybrid index (requires package ff)
[[.bit
[[
get single bit (index checked)
[[<-.bit
[[<-
set single bit (index checked)
[.bit
[
get vector of bits (unchecked)
[<-.bit
[<-
set vector of bits (unchecked)
!.bit
!.bitwhich
(works as second arg in
!
boolean NOT on bit
&.bit
&.bitwhich
bit and bitwhich ops)
&
boolean AND on bit
|.bit
|.bitwhich
|
boolean OR on bit
xor.bit
xor.bitwhich
xor
boolean XOR on bit
!=.bit
!=.bitwhich
!=
boolean unequality (same as XOR)
==.bit
==.bitwhich
==
boolean equality
all.bit
all.bitwhich
all.ri
all
aggregate AND
any.bit
any.bitwhich
any.ri
any
aggregate OR
min.bit
min.bitwhich
min.ri
min
aggregate MIN (first TRUE position)
max.bit
max.bitwhich
max.ri
max
aggregate MAX (last TRUE position)
range.bit
range.bitwhich
range.ri
range
aggregate [MIN,MAX]
sum.bit
sum.bitwhich
sum.ri
sum
aggregate SUM (count of TRUE)
summary.bit
summary.bitwhich
summary.ri
tabulate
aggregate c(nFALSE, nTRUE, minRange, maxRange)
regtest.bit
regressiontests for the package
Value
bit returns a vector of integer sufficiently long to store 'length' bits
(but not longer) with an attribute 'n' and class 'bit'
Note
Currently operations on bit objects have some overhead from R-calls. Do expect speed gains for vectors
of length ~ 10000 or longer.
Since this package was created for high performance purposes, only positive integer subscripts are allowed:
The '[.bit' and '[<-.bit' methods don't check whether the subscripts are positive integers in the allowed range.
All R-functions behave as expected - i.e. they do not change their arguments and create new return values.
If you want to save the time for return value memory allocation, you must use .Call directly
(see the dontrun example in sum.bit).
Note that the package has not been tested under 64 bit.
Note also that the mapping of NAs to TRUE differs from the mapping of NAs to FALSE
in vmode="boolean" in package ff (and one of the two may change in the future).
x <- bit(12) # create bit vector
x # autoprint bit vector
length(x) <- 16 # change length
length(x) # get length
x[[2]] # extract single element
x[[2]] <- TRUE # replace single element
x[1:2] # extract parts of bit vector
x[1:2] <- TRUE # replace parts of bit vector
as.which(x) # coerce bit to subscripts
x <- as.bit.which(3:4, 4) # coerce subscripts to bit
as.logical(x) # coerce bit to logical
y <- as.bit(c(FALSE, TRUE, FALSE, TRUE)) # coerce logical to bit
is.bit(y) # test for bit
!x # boolean NOT
x & y # boolean AND
x | y # boolean OR
xor(x, y) # boolean Exclusive OR
x != y # boolean unequality (same as xor)
x == y # boolean equality
all(x) # aggregate AND
any(x) # aggregate OR
min(x) # aggregate MIN (integer version of ALL)
max(x) # aggregate MAX (integer version of ANY)
range(x) # aggregate [MIN,MAX]
sum(x) # aggregate SUM (count of TRUE)
summary(x) # aggregate count of FALSE and TRUE
## Not run:
message("\nEven for a single boolean operation transforming logical to bit pays off")
n <- 10000000
x <- sample(c(FALSE, TRUE), n, TRUE)
y <- sample(c(FALSE, TRUE), n, TRUE)
system.time(x|y)
system.time({
x <- as.bit(x)
y <- as.bit(y)
})
system.time( z <- x | y )
system.time( as.logical(z) )
message("Even more so if multiple operations are needed :-)")
message("\nEven for a single set operation transforming subscripts to bit pays off\n")
n <- 10000000
x <- sample(n, n/2)
y <- sample(n, n/2)
system.time( union(x,y) )
system.time({
x <- as.bit.which(x, n)
y <- as.bit.which(y, n)
})
system.time( as.which.bit( x | y ) )
message("Even more so if multiple operations are needed :-)")
message("\nSome timings WITH memory allocation")
n <- 2000000
l <- sample(c(FALSE, TRUE), n, TRUE)
# copy logical to logical
system.time(for(i in 1:100){ # 0.0112
l2 <- l
l2[1] <- TRUE # force new memory allocation (copy on modify)
rm(l2)
})/100
# copy logical to bit
system.time(for(i in 1:100){ # 0.0123
b <- as.bit(l)
rm(b)
})/100
# copy bit to logical
b <- as.bit(l)
system.time(for(i in 1:100){ # 0.009
l2 <- as.logical(b)
rm(l2)
})/100
# copy bit to bit
b <- as.bit(l)
system.time(for(i in 1:100){ # 0.009
b2 <- b
b2[1] <- TRUE # force new memory allocation (copy on modify)
rm(b2)
})/100
l2 <- l
# replace logical by TRUE
system.time(for(i in 1:100){
l[] <- TRUE
})/100
# replace bit by TRUE (NOTE that we recycle the assignment
# value on R side == memory allocation and assignment first)
system.time(for(i in 1:100){
b[] <- TRUE
})/100
# THUS the following is faster
system.time(for(i in 1:100){
b <- !bit(n)
})/100
# replace logical by logical
system.time(for(i in 1:100){
l[] <- l2
})/100
# replace bit by logical
system.time(for(i in 1:100){
b[] <- l2
})/100
# extract logical
system.time(for(i in 1:100){
l2[]
})/100
# extract bit
system.time(for(i in 1:100){
b[]
})/100
message("\nSome timings WITHOUT memory allocation (Serge, that's for you)")
n <- 2000000L
l <- sample(c(FALSE, TRUE), n, TRUE)
b <- as.bit(l)
# read from logical, write to logical
l2 <- logical(n)
system.time(for(i in 1:100).Call("R_filter_getset", l, l2, PACKAGE="bit")) / 100
# read from bit, write to logical
l2 <- logical(n)
system.time(for(i in 1:100).Call("R_bit_get", b, l2, c(1L, n), PACKAGE="bit")) / 100
# read from logical, write to bit
system.time(for(i in 1:100).Call("R_bit_set", b, l2, c(1L, n), PACKAGE="bit")) / 100
## End(Not run)