Last data update: 2014.03.03

R: Read data from a DBI connection into an ffdf.
read.dbi.ffdfR Documentation

Read data from a DBI connection into an ffdf.

Description

Read data from a DBI connection into an ffdf. This can for example be used to import large datasets from Oracle, SQLite, MySQL, PostgreSQL, Hive or other SQL databases into R.

Usage

read.dbi.ffdf(query = NULL, dbConnect.args = list(drv = NULL, dbname = NULL,
  username = "", password = ""), dbSendQuery.args = list(),
  fetch.args = list(), x = NULL, nrows = -1, first.rows = NULL,
  next.rows = NULL, levels = NULL, appendLevels = TRUE,
  asffdf_args = list(), BATCHBYTES = getOption("ffbatchbytes"),
  VERBOSE = FALSE, colClasses = NULL, transFUN = NULL, ...)

Arguments

query

the SQL query to execute on the DBI connection

dbConnect.args

a list of arguments to pass to DBI's dbConnect (like drv, dbname, username, password). See the examples.

dbSendQuery.args

a list containing database-specific parameters which will be passed to to pass to dbSendQuery. Defaults to an empty list.

fetch.args

a list containing optional database-specific parameters which will be passed to to pass to fetch. Defaults to an empty list.

x

NULL or an optional ffdf object to which the read records are appended. See documentation in read.table.ffdf for more details and the example below.

nrows

Number of rows to read from the query resultset. Default value of -1 reads in all rows.

first.rows

chunk size (rows) to read for first chunk from the query resultset

next.rows

chunk size (rows) to read sequentially for subsequent chunks from the query resultset. Currently, this must be specified.

levels

optional specification of factor levels. A list with as names the names the columns of the data.frame fetched in the first.rows, containing levels of the factors.

appendLevels

logical. A vector of permissions to expand levels for factor columns. See documentation in read.table.ffdf for more details.

asffdf_args

further arguments passed to as.ffdf (ignored if 'x' gives an ffdf object )

BATCHBYTES

integer: bytes allowed for the size of the data.frame storing the result of reading one chunk. See documentation in read.table.ffdf for more details.

VERBOSE

logical: TRUE to verbose timings for each processed chunk (default FALSE).

colClasses

See documentation in read.table.ffdf

transFUN

function applied to the data frame after each chunk is retreived by fetch

...

optional parameters passed on to transFUN

Details

Opens up the DBI connection using DBI::dbConnect, sends the query using DBI::dbSendQuery and DBI::fetch-es the results in batches of next.rows rows. Heavily borrowed from read.table.ffdf

Value

An ffdf object unless the query returns zero records in which case the function will return the data.frame returned by fetch and possibly transFUN.

See Also

read.table.ffdf, read.odbc.ffdf

Examples

require(ff)

##
## Example query using data in sqlite
##
require(RSQLite)
dbfile <- system.file("smalldb.sqlite3", package="ETLUtils")
drv <- dbDriver("SQLite")
query <- "select * from testdata limit 10000"
x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile),
first.rows = 100, next.rows = 1000, VERBOSE=TRUE)
class(x)
x[1:10, ]

## show it is the same as getting the data directly using RSQLite
## apart from characters which are factors in ffdf objects
directly <- dbGetQuery(dbConnect(drv = drv, dbname = dbfile), query)
directly <- as.data.frame(as.list(directly), stringsAsFactors=TRUE)
all.equal(x[,], directly)

## show how to use the transFUN argument to transform the data before saving into the ffdf
## and shows the use of the levels argument
query <- "select * from testdata limit 10"
x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile),
first.rows = 100, next.rows = 1000, VERBOSE=TRUE, levels = list(a = rev(LETTERS)),
transFUN = function(x, subtractdays){
	x$b <- as.Date(x$b)
	x$b.subtractdaysago <- x$b - subtractdays
	x
}, subtractdays=7)
class(x)
x[1:10, ]
## remark that the levels of column a are reversed due to specifying the levels argument correctly
levels(x$a)

## show how to append data to an existing ffdf object
transformexample <- function(x, subtractdays){
	x$b <- as.Date(x$b)
	x$b.subtractdaysago <- x$b - subtractdays
	x
}
dim(x)
x[,]
combined <- read.dbi.ffdf(query = query,
 dbConnect.args = list(drv = drv, dbname = dbfile),
 first.rows = 100, next.rows = 1000, x = x, VERBOSE=TRUE,
 transFUN = transformexample, subtractdays=1000)
dim(combined)
combined[,]

##
## Example query using ROracle. Do try this at home with some larger data :)
##
## Not run: 
require(ROracle)
query <- "select OWNER, TABLE_NAME, TABLESPACE_NAME, NUM_ROWS, LAST_ANALYZED from all_all_tables"
x <- read.dbi.ffdf(query=query,
dbConnect.args = list(drv = dbDriver("Oracle"),
user = "YourUser", password = "YourPassword", dbname = "Mydatabase"),
first.rows = 100, next.rows = 50000, nrows = -1, VERBOSE=TRUE)

## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ETLUtils)
Loading required package: ff
Loading required package: bit
Attaching package bit
package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
creators: bit bitwhich
coercion: as.logical as.integer as.bit as.bitwhich which
operator: ! & | xor != ==
querying: print length any all min max range sum summary
bit access: length<- [ [<- [[ [[<-
for more help type ?bit

Attaching package: 'bit'

The following object is masked from 'package:base':

    xor

Attaching package ff
- getOption("fftempdir")=="/tmp/Rtmp40mvrb"

- getOption("ffextension")=="ff"

- getOption("ffdrop")==TRUE

- getOption("fffinonexit")==TRUE

- getOption("ffpagesize")==65536

- getOption("ffcaching")=="mmnoflush"  -- consider "ffeachflush" if your system stalls on large writes

- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system

- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system


Attaching package: 'ff'

The following objects are masked from 'package:bit':

    clone, clone.default, clone.list

The following objects are masked from 'package:utils':

    write.csv, write.csv2

The following objects are masked from 'package:base':

    is.factor, is.ordered

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ETLUtils/read.dbi.ffdf.Rd_%03d_medium.png", width=480, height=480)
> ### Name: read.dbi.ffdf
> ### Title: Read data from a DBI connection into an ffdf.
> ### Aliases: read.dbi.ffdf
> 
> ### ** Examples
> 
> require(ff)
> 
> ##
> ## Example query using data in sqlite
> ##
> require(RSQLite)
Loading required package: RSQLite
Loading required package: DBI
> dbfile <- system.file("smalldb.sqlite3", package="ETLUtils")
> drv <- dbDriver("SQLite")
> query <- "select * from testdata limit 10000"
> x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile),
+ first.rows = 100, next.rows = 1000, VERBOSE=TRUE)
read.dbi.ffdf 1..100 (100)  dbi-read=0.003sec ffdf-write=0.017sec
read.dbi.ffdf 101..1100 (1000)  dbi-read=0.002sec ffdf-write=0.008sec
read.dbi.ffdf 1101..2100 (1000)  dbi-read=0.001sec ffdf-write=0.006sec
read.dbi.ffdf 2101..2860 (760)  dbi-read=0.001sec ffdf-write=0.007sec
 dbi-read=0.007sec  ffdf-write=0.038sec  TOTAL=0.045sec
> class(x)
[1] "ffdf"
> x[1:10, ]
   a          b       c
1  A 2012-04-03 0.25034
2  B 2012-04-03 0.25034
3  C 2012-04-03 0.25034
4  D 2012-04-03 0.25034
5  E 2012-04-03 0.25034
6  F 2012-04-03 0.25034
7  G 2012-04-03 0.25034
8  H 2012-04-03 0.25034
9  I 2012-04-03 0.25034
10 J 2012-04-03 0.25034
> 
> ## show it is the same as getting the data directly using RSQLite
> ## apart from characters which are factors in ffdf objects
> directly <- dbGetQuery(dbConnect(drv = drv, dbname = dbfile), query)
> directly <- as.data.frame(as.list(directly), stringsAsFactors=TRUE)
> all.equal(x[,], directly)
[1] TRUE
> 
> ## show how to use the transFUN argument to transform the data before saving into the ffdf
> ## and shows the use of the levels argument
> query <- "select * from testdata limit 10"
> x <- read.dbi.ffdf(query = query, dbConnect.args = list(drv = drv, dbname = dbfile),
+ first.rows = 100, next.rows = 1000, VERBOSE=TRUE, levels = list(a = rev(LETTERS)),
+ transFUN = function(x, subtractdays){
+ 	x$b <- as.Date(x$b)
+ 	x$b.subtractdaysago <- x$b - subtractdays
+ 	x
+ }, subtractdays=7)
read.dbi.ffdf 1..10 (10)  dbi-read=0.001sec ffdf-write=0.007sec
 dbi-read=0.001sec  ffdf-write=0.007sec  TOTAL=0.008sec
> class(x)
[1] "ffdf"
> x[1:10, ]
   a          b       c b.subtractdaysago
1  A 2012-04-03 0.25034        2012-03-27
2  B 2012-04-03 0.25034        2012-03-27
3  C 2012-04-03 0.25034        2012-03-27
4  D 2012-04-03 0.25034        2012-03-27
5  E 2012-04-03 0.25034        2012-03-27
6  F 2012-04-03 0.25034        2012-03-27
7  G 2012-04-03 0.25034        2012-03-27
8  H 2012-04-03 0.25034        2012-03-27
9  I 2012-04-03 0.25034        2012-03-27
10 J 2012-04-03 0.25034        2012-03-27
> ## remark that the levels of column a are reversed due to specifying the levels argument correctly
> levels(x$a)
 [1] "Z" "Y" "X" "W" "V" "U" "T" "S" "R" "Q" "P" "O" "N" "M" "L" "K" "J" "I" "H"
[20] "G" "F" "E" "D" "C" "B" "A"
> 
> ## show how to append data to an existing ffdf object
> transformexample <- function(x, subtractdays){
+ 	x$b <- as.Date(x$b)
+ 	x$b.subtractdaysago <- x$b - subtractdays
+ 	x
+ }
> dim(x)
[1] 10  4
> x[,]
   a          b       c b.subtractdaysago
1  A 2012-04-03 0.25034        2012-03-27
2  B 2012-04-03 0.25034        2012-03-27
3  C 2012-04-03 0.25034        2012-03-27
4  D 2012-04-03 0.25034        2012-03-27
5  E 2012-04-03 0.25034        2012-03-27
6  F 2012-04-03 0.25034        2012-03-27
7  G 2012-04-03 0.25034        2012-03-27
8  H 2012-04-03 0.25034        2012-03-27
9  I 2012-04-03 0.25034        2012-03-27
10 J 2012-04-03 0.25034        2012-03-27
> combined <- read.dbi.ffdf(query = query,
+  dbConnect.args = list(drv = drv, dbname = dbfile),
+  first.rows = 100, next.rows = 1000, x = x, VERBOSE=TRUE,
+  transFUN = transformexample, subtractdays=1000)
read.dbi.ffdf 1..10 (10)  dbi-read=0sec ffdf-write=0.003sec
 dbi-read=0sec  ffdf-write=0.003sec  TOTAL=0.003sec
> dim(combined)
[1] 20  4
> combined[,]
   a          b       c b.subtractdaysago
1  A 2012-04-03 0.25034        2012-03-27
2  B 2012-04-03 0.25034        2012-03-27
3  C 2012-04-03 0.25034        2012-03-27
4  D 2012-04-03 0.25034        2012-03-27
5  E 2012-04-03 0.25034        2012-03-27
6  F 2012-04-03 0.25034        2012-03-27
7  G 2012-04-03 0.25034        2012-03-27
8  H 2012-04-03 0.25034        2012-03-27
9  I 2012-04-03 0.25034        2012-03-27
10 J 2012-04-03 0.25034        2012-03-27
11 A 2012-04-03 0.25034        2009-07-08
12 B 2012-04-03 0.25034        2009-07-08
13 C 2012-04-03 0.25034        2009-07-08
14 D 2012-04-03 0.25034        2009-07-08
15 E 2012-04-03 0.25034        2009-07-08
16 F 2012-04-03 0.25034        2009-07-08
17 G 2012-04-03 0.25034        2009-07-08
18 H 2012-04-03 0.25034        2009-07-08
19 I 2012-04-03 0.25034        2009-07-08
20 J 2012-04-03 0.25034        2009-07-08
> 
> ##
> ## Example query using ROracle. Do try this at home with some larger data :)
> ##
> ## Not run: 
> ##D require(ROracle)
> ##D query <- "select OWNER, TABLE_NAME, TABLESPACE_NAME, NUM_ROWS, LAST_ANALYZED from all_all_tables"
> ##D x <- read.dbi.ffdf(query=query,
> ##D dbConnect.args = list(drv = dbDriver("Oracle"),
> ##D user = "YourUser", password = "YourPassword", dbname = "Mydatabase"),
> ##D first.rows = 100, next.rows = 50000, nrows = -1, VERBOSE=TRUE)
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>