R: Connect to Twitter Streaming API and return public statuses...
filterStream
R Documentation
Connect to Twitter Streaming API and return public statuses that
match one or more filter predicates.
Description
filterStream opens a connection to Twitter's
Streaming API that will return public statuses that match
one or more filter predicates. Tweets can be filtered by
keywords, users, language, and location. The output can
be saved as an object in memory or written to a text
file.
string, name of the file where tweets
will be written. "" indicates output to the console,
which can be redirected to an R object (see examples). If
the file already exists, tweets will be appended (not
overwritten).
string or numeric, vector of Twitter user
IDs, indicating the users whose public statuses should be
delivered on the stream. See the follow parameter
information in the Streaming API documentation for
details:
http://dev.twitter.com/docs/streaming-apis/parameters#follow.
locations
numeric, a vector of longitude, latitude
pairs (with the southwest corner coming first) specifying
sets of bounding boxes to filter public statuses by. See
the locations parameter information in the
Streaming API documentation for details:
http://dev.twitter.com/docs/streaming-apis/parameters#locations
language
string or string vector containing a list
of BCP 47 language identifiers. If not NULL
(default), function will only return tweets that have
been detected as being written in the specified
languages. Note that this parameter can only be used in
combination with any of the other filter parameters. See
documentation for details:
https://dev.twitter.com/docs/streaming-apis/parameters#language
timeout
numeric, maximum length of time (in
seconds) of connection to stream. The connection will be
automatically closed after this period. For example,
setting timeout to 10800 will keep the connection
open for 3 hours. The default is 0, which will keep the
connection open permanently.
tweets
numeric, maximum number of tweets to be
collected when function is called. After that number of
tweets have been captured, function will stop. If set to
NULL (default), the connection will be open for
the number of seconds specified in timeout
parameter.
oauth
an object of class oauth that
contains the access tokens to the user's twitter session.
This is currently the only method for authentication. See
examples for more details.
verbose
logical, default is TRUE, which
generates some output to the R console with information
about the capturing process.
Details
filterStream provides access to the
statuses/filter Twitter stream.
It will return public statuses that match the keywords
given in the track argument, published by the
users specified in the follow argument, written in
the language specified in the language argument,
and sent within the location bounding boxes declared in
the locations argument.
Note that location bounding boxes do not act as filters
for other filter parameters. In the fourth example below,
we capture all tweets containing the term rstats (even
non-geolocated tweets) OR coming from the New York City
area. For more information on how the Streaming API
request parameters work, check the documentation at:
http://dev.twitter.com/docs/streaming-apis/parameters.
Also note that the language parameter needs to be
used in combination with another filter option (either
keywords or location).
If any of these arguments is left empty (e.g. no user
filter is specified), the function will return all public
statuses that match the other filters. At least one
predicate parameter must be specified.
Note that when no file name is provided, tweets are
written to a temporary file, which is loaded in memory as
a string vector when the connection to the stream is
closed.
The total number of actual tweets that are captured might
be lower than the number of tweets requested because
blank lines, deletion notices, and incomplete tweets are
included in the count of tweets downloaded.
## Not run:
## An example of an authenticated request using the ROAuth package,
## where consumerkey and consumer secret are fictitious.
## You can obtain your own at dev.twitter.com
library(ROAuth)
requestURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "http://api.twitter.com/oauth/access_token"
authURL <- "http://api.twitter.com/oauth/authorize"
consumerKey <- "xxxxxyyyyyzzzzzz"
consumerSecret <- "xxxxxxyyyyyzzzzzzz111111222222"
my_oauth <- OAuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret, requestURL=requestURL,
accessURL=accessURL, authURL=authURL)
my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
filterStream( file="tweets_rstats.json",
track="rstats", timeout=3600, oauth=my_oauth )
## capture 10 tweets mentioning the "Rstats" hashtag
filterStream( file.name="tweets_rstats.json",
track="rstats", tweets=10, oauth=my_oauth )
## capture tweets published by Twitter's official account
filterStream( file.name="tweets_twitter.json",
follow="783214", timeout=600, oauth=my_oauth )
## capture tweets sent from New York City in Spanish only, and saving as an object in memory
tweets <- filterStream( file.name="", language="es",
locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth )
## capture tweets mentioning the "rstats" hashtag or sent from New York City
filterStream( file="tweets_rstats.json", track="rstats",
locations=c(-74,40,-73,41), timeout=600, oauth=my_oauth )
## End(Not run)