R Graphical Manual

Browse All

Last data update: 2014.03.03

R: File processing function for binary files.

read.bin

R Documentation

File processing function for binary files.

Description

A function to process binary accelerometer files and convert the information into R objects.

Usage

read.bin(binfile, outfile = NULL, start = NULL, end = NULL, 
    verbose = TRUE, do.temp = TRUE,do.volt = TRUE, calibrate = TRUE, downsample = NULL, blocksize , virtual = FALSE, mmap.load = (.Machine$sizeof.pointer >= 8), pagerefs = TRUE, ...)

Arguments

`binfile`	A filename of a file to process.
`outfile`	An optional filename specifying where to save the processed data object.
`start`	Either: A representation of when in the file to begin processing, see Details.
`end`	Either: A representation of when in the file to end processing, see Details.
`verbose`	A boolean variable indicating whether some information should be printed during processing should be printed.
`do.temp`	A boolean variable indicating whether the temperature signal should be extracted.
`do.volt`	A boolean variable indicating whether the voltage signal should be extracted.
`calibrate`	A boolean variable indicating whether the raw accelerometer values and the light variable should be calibrated according to the calibration data in the headers.
`downsample`	A variable indicating the type of downsampling to apply to the data as it is loaded. Can take values: `NULL`: (Default) No downsampling Single numeric: Reads every `downsample`-th value, starting from the first. Length two numeric vector: Reads every `downsample[1]`-th value, starting from the `downsample[2]`-th. Non-integer, or non-divisor of 300 downsampling factors are allowed, but will lead to imprecise frequency calculations, leap seconds being introduced, and generally potential problems with other methods. Use with care.
`blocksize`	Integer value giving maximum number of data pages to read in each pass. Defaults to 10000 for larger data files. Sufficiently small sizes will split very large data files to read chunk by chunk, reducing memory requirements for the read.bin function (without affecting the final object), but conversely possibly increasing processing time. Can be set to Inf for no splitting.
`virtual`	logical. If set TRUE, do not do any actual data reading. Instead construct a VirtualAccData object containing header information to allow use with `get.intervals`.
`mmap.load`	logical. If TRUE (Default on 64bit R), use the `mmap` package to process the binfile.
`pagerefs`	A variable that can take two forms, and is considered only for `mmap.load = TRUE` NULL or FALSE, in which case pagerefs are dynamically calculated for each record. (Default) A vector giving sorted byte offsets for each record for mmap reading of data files. TRUE, in which case a full page reference table is computed before any processing occurs. Computing pagerefs takes a little time and so is a little slower. However, it is safer than dynamic computations in the case of missing pages and high temperature variations. Further, once page references are calculated, future reads are much faster, so long as the previously computed references are supplied.
`...`	Any other optional arguments can be supplied that affect manual calibration and data processing. These are: `gain`: a vector of 3 values for manual gain calibration of the raw (x,y,z) axes. If `gain=NULL`, the gain calibration values are taken from within the output file itself. `offset`: a vector of 3 value for manual offset calibration of the raw (x,y,z) axes. If `offset=NULL`, the offset calibration values are taken from within the output file itself. `luxv`: a value for manual lux calibration of the light meter. If `luxv=NULL`, the lux calibration value is taken from within the output file itself. `voltv`: a value for manual volts calibration of the light meter. If `voltv=NULL`, the volts calibration value is taken from within the output file itself. `warn`: if set to true, give a warning if input file is large, and require user confirmation.

Details

The read.bin package reads in binary files compatible with the GeneActiv line of Accelerometers, for further processing by the other functions in this package. Most of the default options are those required in the most common cases, though users are advised to consider setting start and end to smaller intervals and/or choosing some level of downsampling when working with data files of longer than 24 hours in length.

The function reads in the desired analysis time window specified by start and end. For convenience, a variety of time window formats are accepted:

Large integers are read as page numbers in the dataset. Page numbers larger than that which is available in the file itself are constrained to what is available. Note that the first page is page 1.

Small values (between 0 and 1) are taken as proportions of the data. For example, 'start = 0.5' would specify that reading should begin at the midpoint of the data.

Strings are interpreted as dates and times using parse.time. In particular, times specified as "HH:MM" or "HH:MM:SS" are taken as the earliest time interval containing these times in the file. Strings with an integer prepended, using a space seperator, as interpreted as that time after the appropriate number of midnights have passed - in other words, the appropriate time of day on the Nth *full* day. Days of the week and dates in "day/month", "day/month/year", "month-day", "year-month-day" are also handled. Note that the time is interpreted in the same time zone as the data recording itself.

Actual data reading proceeds by two methods, depending on whether mmap is true or false. With mmap = FALSE, data is read in line by line using readLine until blocksize is filled, and then processed. With mmap = TRUE, the mmap package is used to map the entire data file into an address file, byte locations are calculated (depending on the setting of pagerefs), blocksize chunks of data are loaded, and then processed as raw vectors.

There are advantages and disadvantages to both methods: the mmap method is usually much faster, especially when we are only loading the final parts of the data. ReadLine will have to process the entire file in such a case. On the other hand, mmap requires a large amount of memory address space, and so can fail in 32 bit systems. Finally, reading of compressed bin files can only be done with the readLine method. Generally, if mmap reading fails, the function will attempt to catch the failure, and reprocess the file with the readLine method, giving a warning.

Once data is loaded, calibration is then either performed using values from the binary file, or using manually inputted values (using the gain, offset,luxv and voltv arguments).

Value

With virtual = FALSE, an "AccData" S3 object with 9 components:

`data.out`	A 6 or 7 column matrix of the processed pages, the rows of which are the processed observations in order of processed pages. The matrix has columns (timestamp,x-axis,y-axis,z-axis,light,button) or (timestamp,x-axis,y-axis,z-axis,light,button,temperature) if `do.temp=TRUE`. The timestamp is stored as seconds since 1 Jan 1970, in the timezone that the data is recorded in.
`page.timestamps`	The timestamps as POSIXct representations (as opposed to those within the `data.out` array.)
`freq`	The effective sampling frequency (in Hz).
`filename`	The file name of the bin file.
`page.numbers`	The pages that were loaded.
`call`	The function call that the object was created with.
`page.volts`	The battery voltage associated with each loaded page, if `do.volt` is TRUE.
`pagerefs`	The page byte offsets that were computed.
`header`	File header output, as given by `header.info`.

Various processing methods are implemented so that AccData objects can be treated as an ordinary matrix in many cases. See print.AccData for info.

With virtual = TRUE, a "VirtAccData" S3 object with page.timestamps, freq, filename, page.numbers, call, pagerefs, header as in the earlier case, but also,

`data.out`	A vector containing the timestamps of each page, using local seconds since 1970.
`nobs`	Number of observations per page, after downsampling.

Warning

Reading in an entire .bin file will take a long time if the file contains a lot of datasets. Reading in such files without downsampling can use up all available memory. See memory.limit.

This function is specific to header structure in GENEActiv output files. By design, it should be compatible with all firmware and software versions to date (as of version of current release). If order or field names are changed in future .bin files, this function may have to be updated appropriately.

Examples


binfile  = system.file("binfile/TESTfile.bin", package = "GENEAread")[1]

#Read in the entire file, calibrated
procfile<-read.bin(binfile)
print(procfile)
procfile$data.out[1:5,]

#Uncalibrated, mmap off
procfile2<-read.bin(binfile, calibrate = FALSE)
procfile2$data.out[1:5,]

#Read in again, reusing already computed mmap pagerefs
procfile3<-read.bin(binfile, pagerefs = procfile2$pagerefs )

#Downsample by a factor of 10
procfilelo<-read.bin(binfile, downsample = 10)
print(procfilelo)
object.size(procfilelo) / object.size(procfile)

#Read in a 1 minute interval
procfileshort <- read.bin(binfile, start = "16:50", end = "16:51")
print(procfileshort)

##NOT RUN: Read, and save as a R workspace
#read.bin(binfile, outfile="tmp.Rdata")
#print(load("tmp.Rdata"))
#print(processedfile)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GENEAread)
Loading required package: bitops
GENEAread 1.1.1 loaded

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/GENEAread/read.bin.Rd_%03d_medium.png", width=480, height=480)
> ### Name: read.bin
> ### Title: File processing function for binary files.
> ### Aliases: read.bin
> ### Keywords: IO
> 
> ### ** Examples
> 
> 
> binfile  = system.file("binfile/TESTfile.bin", package = "GENEAread")[1]
> 
> #Read in the entire file, calibrated
> procfile<-read.bin(binfile)
Loading required package: mmap
Number of pages in binary file: 104 
Calculated page references... 
Processing...
================================================================================
Processing took: 0.052 secs .
Loaded 31200 records (Approx  2 MB of RAM)
12-05-23 16:47:50.000 (Wed)  to  12-05-23 16:53:01.990 (Wed) 
Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion
> print(procfile)
GENEAread dataset:  31200 records at 100 Hz (Approx  2 MB of RAM)
12-05-23 16:47:50.000 (Wed)  to  12-05-23 16:53:01.990 (Wed) 
[ TESTfile.bin ]
> procfile$data.out[1:5,]
         timestamp                  x              y                z light
[1,] 1337791670.00  0.023516414141414 -0.88728256668 -0.1007852375344     0
[2,] 1337791670.01 -0.000157828282828 -1.08828759181 -0.0929328621908     0
[3,] 1337791670.02  0.023516414141414 -1.04190181678 -0.0733019238320     0
[4,] 1337791670.03  0.011679292929293 -1.06509470429 -0.0654495484884     0
[5,] 1337791670.04  0.031407828282828 -1.11148047932 -0.1400471142521     0
     button temperature
[1,]      0        25.8
[2,]      0        25.8
[3,]      0        25.8
[4,]      0        25.8
[5,]      0        25.8
> 
> #Uncalibrated, mmap off
> procfile2<-read.bin(binfile, calibrate = FALSE)
Number of pages in binary file: 104 
Calculated page references... 
Processing...
================================================================================
Processing took: 0.034 secs .
Loaded 31200 records (Approx  2 MB of RAM)
12-05-23 16:47:50.000 (Wed)  to  12-05-23 16:53:01.990 (Wed) 
Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion
> procfile2$data.out[1:5,]
         timestamp  x    y   z light button temperature
[1,] 1337791670.00 17 -225 -40     0      0        25.8
[2,] 1337791670.01 11 -277 -38     0      0        25.8
[3,] 1337791670.02 17 -265 -33     0      0        25.8
[4,] 1337791670.03 14 -271 -31     0      0        25.8
[5,] 1337791670.04 19 -283 -50     0      0        25.8
> 
> #Read in again, reusing already computed mmap pagerefs
> procfile3<-read.bin(binfile, pagerefs = procfile2$pagerefs )
Number of pages in binary file: 104 
Processing...
================================================================================
Processing took: 0.034 secs .
Loaded 31200 records (Approx  2 MB of RAM)
12-05-23 16:47:50.000 (Wed)  to  12-05-23 16:53:01.990 (Wed) 
Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion
> 
> #Downsample by a factor of 10
> procfilelo<-read.bin(binfile, downsample = 10)
Downsampling to  10  Hz 
Number of pages in binary file: 104 
Calculated page references... 
Processing...
================================================================================
Processing took: 0.017 secs .
Loaded 3120 records (Approx  0 MB of RAM)
12-05-23 16:47:50.000 (Wed)  to  12-05-23 16:53:01.900 (Wed) 
Warning message:
In FUN(newX[, i], ...) : NAs introduced by coercion
> print(procfilelo)
GENEAread dataset:  3120 records at 10 Hz (Approx  0 MB of RAM)
12-05-23 16:47:50.000 (Wed)  to  12-05-23 16:53:01.900 (Wed) 
[ TESTfile.bin ]
> object.size(procfilelo) / object.size(procfile)
0.106183944845651 bytes
> 
> #Read in a 1 minute interval
> procfileshort <- read.bin(binfile, start = "16:50", end = "16:51")
Number of pages in binary file: 104 
Calculated page references... 
Processing...
================================================================================
Processing took: 0.004 secs .
Loaded 6600 records (Approx  0 MB of RAM)
12-05-23 16:49:59.000 (Wed)  to  12-05-23 16:51:04.990 (Wed) 
> print(procfileshort)
GENEAread dataset:  6600 records at 100 Hz (Approx  0 MB of RAM)
12-05-23 16:49:59.000 (Wed)  to  12-05-23 16:51:04.990 (Wed) 
[ TESTfile.bin ]
> 
> ##NOT RUN: Read, and save as a R workspace
> #read.bin(binfile, outfile="tmp.Rdata")
> #print(load("tmp.Rdata"))
> #print(processedfile)
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>