R: Cumulate text across callbacks (from an HTTP response)
basicTextGatherer
R Documentation
Cumulate text across callbacks (from an HTTP response)
Description
These functions create callback functions that can be used
to with the libcurl engine when it passes information to us
when it is available as part of the HTTP response.
basicTextGatherer is a generator function that returns a closure which is
used to cumulate text provided in callbacks from the libcurl
engine when it reads the response from an HTTP request.
debugGatherer can be used with the debugfunction
libcurl option in a call and the associated update
function is called whenever libcurl has information
about the header, data and general messages about the
request.
These functions return a list of functions.
Each time one calls basicTextGatherer or
debugGatherer, one gets a new, separate
collection of functions. However, each
collection of functions (or instance) shares
the variables across the functions and across calls.
This allows them to store data persistently across
the calls without using a global variable.
In this way, we can have multiple instances of the collection
of functions, with each instance updating its own local state
and not interfering with those of the others.
We use an S3 class named RCurlCallbackFunction to indicate
that the collection of funcions can be used as a callback.
The update function is the one that is actually used
as the callback function in the CURL option.
The value function can be invoked to get the current
state that has been accumulated by the
update function. This is typically used
when the request is complete.
One can reuse the same collection of functions across
different requests. The information will be cumulated.
Sometimes it is convenient to reuse the object but
reset the state to its original empty value, as it had
been created afresh. The reset function in the collection
permits this.
multiTextGatherer is used when we are downloading multiple
URIs concurrently in a single libcurl operation. This merely
uses the tools of basicTextGatherer applied to each of
several URIs. See getURIAsynchronous.
Usage
basicTextGatherer(txt = character(), max = NA, value = NULL,
.mapUnicode = TRUE)
multiTextGatherer(uris, binary = rep(NA, length(uris)))
debugGatherer()
Arguments
txt
an initial character vector to start things.
We allow this to be specified so that one can initialize
the content.
max
if specified as an integer this controls the total number
of characters that will be read. If more are read, the function
tells libcurl to stop!
uris
for multiTextGatherer, this is either the number
or the names of the uris being downloaded and for which we
need a separate writer function.
value
if specified, a function that is called when retrieving
the text usually after the completion of the request and the
processing of the response. This function can be used to convert the
result into a different format, e.g. parse an XML document,
read values from table in the text.
.mapUnicode
a logical value that controls whether the resulting
text is processed to map components of the form uxxxx to their
appropriate Unicode representation.
binary
a logical vector that indicates which URIs yield binary content
Details
This is called when the libcurl engine finds sufficient
data on the stream from which it is reading the response.
It cumulates these bytes and hands them to a C routine in
this package which calls the actual gathering function (or a suitable
replacement) returned as the update component from this function.
Value
Both the basicTextGatherer and debugGatherer
functions return an object of class
RCurlCallbackFunction.
basicTextGatherer extends this with the class
RCurlTextHandler
and
debugGatherer extends this with the class
RCurlDebugHandler.
Each of these has the same basic structure,
being a list of 3 functions.
update
the function that is called with the text from the
callback routine and which processes this text by accumulating it
into a vector
value
a function that returns the text cumulated across the
callbacks. This takes an argument collapse (and additional ones)
that are handed to paste.
If the value of collapse is given as NULL,
the vector of elements containing the different text for each
callback is returned. This is convenient when debugging or if one
knows something about the nature of the callbacks, e.g. the regular
size that causes iit to identify records in a natural way.
reset
a function that resets the internal state to its
original, empty value. This can be used to reuse the same object
across requests but to avoid cumulating new input with the material from previous requests.
multiTextGatherer returns a list with an element corresponding
to each URI. Each element is an object obtained by calling
basicTextGatherer, i.e. a collection of 3 functions with
shared state.
if(url.exists("http://www.omegahat.net/RCurl/index.html")) {
txt = getURL("http://www.omegahat.net/RCurl/index.html", write = basicTextGatherer())
h = basicTextGatherer()
txt = getURL("http://www.omegahat.net/RCurl/index.html", write = h$update)
# Cumulate across pages.
txt = getURL("http://www.omegahat.net/index.html", write = h$update)
headers = basicTextGatherer()
txt = getURL("http://www.omegahat.net/RCurl/index.html",
header = TRUE, headerfunction = headers$update)
# Now read the headers.
headers$value()
headers$reset()
# Debugging callback
d = debugGatherer()
x = getURL("http://www.omegahat.net/RCurl/index.html", debugfunction = d$update, verbose = TRUE)
names(d$value())
d$value()[["headerIn"]]
uris = c("http://www.omegahat.net/RCurl/index.html",
"http://www.omegahat.net/RCurl/philosophy.html")
g = multiTextGatherer(uris)
txt = getURIAsynchronous(uris, write = g)
names(txt)
nchar(txt)
# Now don't use names for the gatherer elements.
g = multiTextGatherer(length(uris))
txt = getURIAsynchronous(uris, write = g)
names(txt)
nchar(txt)
}
## Not run:
Sys.setlocale(,"en_US.latin1")
Sys.setlocale(,"en_US.UTF-8")
uris = c("http://www.omegahat.net/RCurl/index.html",
"http://www.omegahat.net/RCurl/philosophy.html")
g = multiTextGatherer(uris)
txt = getURIAsynchronous(uris, write = g)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(RCurl)
Loading required package: bitops
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/RCurl/basicTextGatherer.Rd_%03d_medium.png", width=480, height=480)
> ### Name: basicTextGatherer
> ### Title: Cumulate text across callbacks (from an HTTP response)
> ### Aliases: basicTextGatherer multiTextGatherer debugGatherer
> ### Keywords: IO
>
> ### ** Examples
>
> if(url.exists("http://www.omegahat.net/RCurl/index.html")) {
+ txt = getURL("http://www.omegahat.net/RCurl/index.html", write = basicTextGatherer())
+
+ h = basicTextGatherer()
+ txt = getURL("http://www.omegahat.net/RCurl/index.html", write = h$update)
+ # Cumulate across pages.
+ txt = getURL("http://www.omegahat.net/index.html", write = h$update)
+
+
+ headers = basicTextGatherer()
+ txt = getURL("http://www.omegahat.net/RCurl/index.html",
+ header = TRUE, headerfunction = headers$update)
+
+ # Now read the headers.
+ headers$value()
+ headers$reset()
+
+
+ # Debugging callback
+ d = debugGatherer()
+ x = getURL("http://www.omegahat.net/RCurl/index.html", debugfunction = d$update, verbose = TRUE)
+ names(d$value())
+ d$value()[["headerIn"]]
+
+
+ uris = c("http://www.omegahat.net/RCurl/index.html",
+ "http://www.omegahat.net/RCurl/philosophy.html")
+ g = multiTextGatherer(uris)
+ txt = getURIAsynchronous(uris, write = g)
+ names(txt)
+ nchar(txt)
+
+ # Now don't use names for the gatherer elements.
+ g = multiTextGatherer(length(uris))
+ txt = getURIAsynchronous(uris, write = g)
+ names(txt)
+ nchar(txt)
+ }
[1] 4284 58512
>
>
> ## Not run:
> ##D Sys.setlocale(,"en_US.latin1")
> ##D Sys.setlocale(,"en_US.UTF-8")
> ##D uris = c("http://www.omegahat.net/RCurl/index.html",
> ##D "http://www.omegahat.net/RCurl/philosophy.html")
> ##D g = multiTextGatherer(uris)
> ##D txt = getURIAsynchronous(uris, write = g)
> ## End(Not run)
>
>
>
>
>
> dev.off()
null device
1
>