R: Dataset for practicing cleaning, labelling and recoding
Data for cleaning
R Documentation
Dataset for practicing cleaning, labelling and recoding
Description
The data come from clients of a family planning clinic.
For all variables except id: 9, 99, 99.9, 888, 999 represent missing values
Usage
data(Planning)
Format
A data frame with 251 observations on the following 11 variables.
ID
a numeric vector: ID code
AGE
a numeric vector
RELIG
a numeric vector: Religion
1
= Buddhist
2
= Muslim
PED
a numeric vector: Patient's education level
1
= none
2
= primary school
3
= secondary school
4
= high school
5
= vocational school
6
= university
7
= other
INCOME
a numeric vector: Monthly income in Thai Baht
1
= nil
2
= < 1,000
3
= 1,000-4,999
4
= 5,000-9,999
5
= 10,000
AM
a numeric vector: Age at marriage
REASON
a numeric vector: Reason for family planning
1
= birth spacing
2
= enough children
3
= other
BPS
a numeric vector: systolic blood pressure
BPD
a numeric vector: diastolic blood pressure
WT
a numeric vector: weight (Kg)
HT
a numeric vector: height (cm)
Examples
data(Planning)
des(Planning)
# Change var. name to lowercase
names(Planning) <- tolower(names(Planning))
.data <- Planning
des(.data)
# Check for duplication of 'id'
attach(.data)
any(duplicated(id))
duplicated(id)
id[duplicated(id)] #215
# Which one(s) are missing?
setdiff(min(id):max(id), id) # 216
# Correct the wrong on
id[duplicated(id)] <- 216
detach(.data)
rm(list=ls())