a character vector giving the package(s) to look
in for data sets, or NULL.
By default, all packages in the search path are used, then
the ‘data’ subdirectory (if present) of the current working
directory.
lib.loc
a character vector of directory names of R libraries,
or NULL. The default value of NULL corresponds to all
libraries currently known.
verbose
a logical. If TRUE, additional diagnostics are
printed.
envir
the environment where the data should be loaded.
Details
Currently, four formats of data files are supported:
files ending ‘.R’ or ‘.r’ are
source()d in, with the R working directory changed
temporarily to the directory containing the respective file.
(data ensures that the utils package is attached, in
case it had been run viautils::data.)
files ending ‘.RData’ or ‘.rda’ are
load()ed.
files ending ‘.tab’, ‘.txt’ or ‘.TXT’ are read
using read.table(..., header = TRUE, as.is=FALSE),
and hence
result in a data frame.
files ending ‘.csv’ or ‘.CSV’ are read using
read.table(..., header = TRUE, sep = ";", as.is=FALSE),
and also result in a data frame.
If more than one matching file name is found, the first on this list
is used. (Files with extensions ‘.txt’, ‘.tab’ or
‘.csv’ can be compressed, with or without further extension
‘.gz’, ‘.bz2’ or ‘.xz’.)
The data sets to be loaded can be specified as a set of character
strings or names, or as the character vector list, or as both.
For each given data set, the first two types (‘.R’ or ‘.r’,
and ‘.RData’ or ‘.rda’ files) can create several variables
in the load environment, which might all be named differently from the
data set. The third and fourth types will always result in the
creation of a single variable with the same name (without extension)
as the data set.
If no data sets are specified, data lists the available data
sets. It looks for a new-style data index in the ‘Meta’ or, if
this is not found, an old-style ‘00Index’ file in the ‘data’
directory of each specified package, and uses these files to prepare a
listing. If there is a ‘data’ area but no index, available data
files for loading are computed and included in the listing, and a
warning is given: such packages are incomplete. The information about
available data sets is returned in an object of class
"packageIQR". The structure of this class is experimental.
Where the datasets have a different name from the argument that should
be used to retrieve them the index will have an entry like
beaver1 (beavers) which tells us that dataset beaver1
can be retrieved by the call data(beaver).
If lib.loc and package are both NULL (the
default), the data sets are searched for in all the currently loaded
packages then in the ‘data’ directory (if any) of the current
working directory.
If lib.loc = NULL but package is specified as a
character vector, the specified package(s) are searched for first
amongst loaded packages and then in the default library/ies
(see .libPaths).
If lib.locis specified (and not NULL), packages
are searched for in the specified library/ies, even if they are
already loaded from another library.
To just look in the ‘data’ directory of the current working
directory, set package = character(0) (and lib.loc =
NULL, the default).
Value
A character vector of all data sets specified, or information about
all available data sets in an object of class "packageIQR" if
none were specified.
Good practice
data() was originally intended to allow users to load datasets
from packages for use in their examples, and as such it loaded the
datasets into the workspace .GlobalEnv. This avoided
having large datasets in memory when not in use. That need has been
almost entirely superseded by lazy-loading of datasets.
The ability to specify a dataset by name (without quotes) is a
convenience: in programming the datasets should be specified by
character strings (with quotes).
Use of data within a function without an envir argument
has the almost always undesirable side-effect of putting an object in
the user's workspace (and indeed, of replacing any object of that name
already there). It would almost always be better to put the object in
the current evaluation environment by data(..., envir =
environment()). However, two alternatives are usually preferable,
both described in the ‘Writing R Extensions’ manual.
For sets of data, set up a package to use lazy-loading of data.
For objects which are system data, for example lookup tables
used in calculations within the function, use a file
‘R/sysdata.rda’ in the package sources or create the objects by
R code at package installation time.
A sometimes important distinction is that the second approach places
objects in the namespace but the first does not. So if it is important
that the function sees mytable as an object from the package,
it is system data and the second approach should be used. In the
unusual case that a package uses a lazy-loaded dataset as a default
argument to a function, that needs to be specified by ::,
e.g., survival::survexp.us.
Note
One can take advantage of the search order and the fact that a
‘.R’ file will change directory. If raw data are stored in
‘mydata.txt’ then one can set up ‘mydata.R’ to read
‘mydata.txt’ and pre-process it, e.g., using transform.
For instance one can convert numeric vectors to factors with the
appropriate labels. Thus, the ‘.R’ file can effectively contain
a metadata specification for the plaintext formats.
See Also
help for obtaining documentation on data sets,
save for creating the second (‘.rda’) kind
of data, typically the most efficient one.
The ‘Writing R Extensions’ for considerations in preparing the
‘data’ directory of a package.
Examples
require(utils)
data() # list all available data sets
try(data(package = "rpart") ) # list the data sets in the rpart package
data(USArrests, "VADeaths") # load the data sets 'USArrests' and 'VADeaths'
## Not run: ## Alternatively
ds <- c("USArrests", "VADeaths"); data(list = ds)
## End(Not run)
help(USArrests) # give information on data set 'USArrests'
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(utils)
> png(filename="/home/ddbj/snapshot/RGM3/R_rel/result/utils/data.Rd_%03d_medium.png", width=480, height=480)
> ### Name: data
> ### Title: Data Sets
> ### Aliases: data print.packageIQR
> ### Keywords: documentation datasets
>
> ### ** Examples
>
> require(utils)
> data() # list all available data sets
Data sets in package 'datasets':
AirPassengers Monthly Airline Passenger Numbers 1949-1960
BJsales Sales Data with Leading Indicator
BJsales.lead (BJsales)
Sales Data with Leading Indicator
BOD Biochemical Oxygen Demand
CO2 Carbon Dioxide Uptake in Grass Plants
ChickWeight Weight versus age of chicks on different diets
DNase Elisa assay of DNase
EuStockMarkets Daily Closing Prices of Major European Stock
Indices, 1991-1998
Formaldehyde Determination of Formaldehyde
HairEyeColor Hair and Eye Color of Statistics Students
Harman23.cor Harman Example 2.3
Harman74.cor Harman Example 7.4
Indometh Pharmacokinetics of Indomethacin
InsectSprays Effectiveness of Insect Sprays
JohnsonJohnson Quarterly Earnings per Johnson & Johnson Share
LakeHuron Level of Lake Huron 1875-1972
LifeCycleSavings Intercountry Life-Cycle Savings Data
Loblolly Growth of Loblolly pine trees
Nile Flow of the River Nile
Orange Growth of Orange Trees
OrchardSprays Potency of Orchard Sprays
PlantGrowth Results from an Experiment on Plant Growth
Puromycin Reaction Velocity of an Enzymatic Reaction
Seatbelts Road Casualties in Great Britain 1969-84
Theoph Pharmacokinetics of Theophylline
Titanic Survival of passengers on the Titanic
ToothGrowth The Effect of Vitamin C on Tooth Growth in
Guinea Pigs
UCBAdmissions Student Admissions at UC Berkeley
UKDriverDeaths Road Casualties in Great Britain 1969-84
UKgas UK Quarterly Gas Consumption
USAccDeaths Accidental Deaths in the US 1973-1978
USArrests Violent Crime Rates by US State
USJudgeRatings Lawyers' Ratings of State Judges in the US
Superior Court
USPersonalExpenditure Personal Expenditure Data
UScitiesD Distances Between European Cities and Between
US Cities
VADeaths Death Rates in Virginia (1940)
WWWusage Internet Usage per Minute
WorldPhones The World's Telephones
ability.cov Ability and Intelligence Tests
airmiles Passenger Miles on Commercial US Airlines,
1937-1960
airquality New York Air Quality Measurements
anscombe Anscombe's Quartet of 'Identical' Simple Linear
Regressions
attenu The Joyner-Boore Attenuation Data
attitude The Chatterjee-Price Attitude Data
austres Quarterly Time Series of the Number of
Australian Residents
beaver1 (beavers) Body Temperature Series of Two Beavers
beaver2 (beavers) Body Temperature Series of Two Beavers
cars Speed and Stopping Distances of Cars
chickwts Chicken Weights by Feed Type
co2 Mauna Loa Atmospheric CO2 Concentration
crimtab Student's 3000 Criminals Data
discoveries Yearly Numbers of Important Discoveries
esoph Smoking, Alcohol and (O)esophageal Cancer
euro Conversion Rates of Euro Currencies
euro.cross (euro) Conversion Rates of Euro Currencies
eurodist Distances Between European Cities and Between
US Cities
faithful Old Faithful Geyser Data
fdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
freeny Freeny's Revenue Data
freeny.x (freeny) Freeny's Revenue Data
freeny.y (freeny) Freeny's Revenue Data
infert Infertility after Spontaneous and Induced
Abortion
iris Edgar Anderson's Iris Data
iris3 Edgar Anderson's Iris Data
islands Areas of the World's Major Landmasses
ldeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
lh Luteinizing Hormone in Blood Samples
longley Longley's Economic Regression Data
lynx Annual Canadian Lynx trappings 1821-1934
mdeaths (UKLungDeaths)
Monthly Deaths from Lung Diseases in the UK
morley Michelson Speed of Light Data
mtcars Motor Trend Car Road Tests
nhtemp Average Yearly Temperatures in New Haven
nottem Average Monthly Temperatures at Nottingham,
1920-1939
npk Classical N, P, K Factorial Experiment
occupationalStatus Occupational Status of Fathers and their Sons
precip Annual Precipitation in US Cities
presidents Quarterly Approval Ratings of US Presidents
pressure Vapor Pressure of Mercury as a Function of
Temperature
quakes Locations of Earthquakes off Fiji
randu Random Numbers from Congruential Generator
RANDU
rivers Lengths of Major North American Rivers
rock Measurements on Petroleum Rock Samples
sleep Student's Sleep Data
stack.loss (stackloss)
Brownlee's Stack Loss Plant Data
stack.x (stackloss) Brownlee's Stack Loss Plant Data
stackloss Brownlee's Stack Loss Plant Data
state.abb (state) US State Facts and Figures
state.area (state) US State Facts and Figures
state.center (state) US State Facts and Figures
state.division (state)
US State Facts and Figures
state.name (state) US State Facts and Figures
state.region (state) US State Facts and Figures
state.x77 (state) US State Facts and Figures
sunspot.month Monthly Sunspot Data, from 1749 to "Present"
sunspot.year Yearly Sunspot Data, 1700-1988
sunspots Monthly Sunspot Numbers, 1749-1983
swiss Swiss Fertility and Socioeconomic Indicators
(1888) Data
treering Yearly Treering Data, -6000-1979
trees Girth, Height and Volume for Black Cherry Trees
uspop Populations Recorded by the US Census
volcano Topographic Information on Auckland's Maunga
Whau Volcano
warpbreaks The Number of Breaks in Yarn during Weaving
women Average Heights and Weights for American Women
Use 'data(package = .packages(all.available = TRUE))'
to list the data sets in all *available* packages.
> try(data(package = "rpart") ) # list the data sets in the rpart package
Data sets in package 'rpart':
car.test.frame Automobile Data from 'Consumer Reports' 1990
car90 Automobile Data from 'Consumer Reports' 1990
cu.summary Automobile Data from 'Consumer Reports' 1990
kyphosis Data on Children who have had Corrective Spinal
Surgery
solder Soldering of Components on Printed-Circuit
Boards
stagec Stage C Prostate Cancer
> data(USArrests, "VADeaths") # load the data sets 'USArrests' and 'VADeaths'
> ## Not run:
> ##D ## Alternatively
> ##D ds <- c("USArrests", "VADeaths"); data(list = ds)
> ## End(Not run)
> help(USArrests) # give information on data set 'USArrests'
USArrests package:datasets R Documentation
_V_i_o_l_e_n_t _C_r_i_m_e _R_a_t_e_s _b_y _U_S _S_t_a_t_e
_D_e_s_c_r_i_p_t_i_o_n:
This data set contains statistics, in arrests per 100,000
residents for assault, murder, and rape in each of the 50 US
states in 1973. Also given is the percent of the population
living in urban areas.
_U_s_a_g_e:
USArrests
_F_o_r_m_a_t:
A data frame with 50 observations on 4 variables.
[,1] Murder numeric Murder arrests (per 100,000)
[,2] Assault numeric Assault arrests (per 100,000)
[,3] UrbanPop numeric Percent urban population
[,4] Rape numeric Rape arrests (per 100,000)
_S_o_u_r_c_e:
World Almanac and Book of facts 1975. (Crime rates).
Statistical Abstracts of the United States 1975. (Urban rates).
_R_e_f_e_r_e_n_c_e_s:
McNeil, D. R. (1977) _Interactive Data Analysis_. New York:
Wiley.
_S_e_e _A_l_s_o:
The 'state' data sets.
_E_x_a_m_p_l_e_s:
require(graphics)
pairs(USArrests, panel = panel.smooth, main = "USArrests data")
>
>
>
>
>
> dev.off()
null device
1
>