Each record represents follow-up data for one breast cancer
case. These are consecutive patients seen by Dr. Wolberg
since 1984, and include only those cases exhibiting invasive
breast cancer and no evidence of distant metastases at the
time of diagnosis.
Usage
data("wpbc")
Format
A data frame with 198 observations on the following 34 variables.
status
a factor with levels N (nonrecur) and
R (recur)
time
recurrence time (for status == "R") or
disease-free time (for status == "N").
mean_radius
radius (mean of distances from center to points on the perimeter) (mean).
mean_texture
texture (standard deviation of gray-scale values) (mean).
mean_perimeter
perimeter (mean).
mean_area
area (mean).
mean_smoothness
smoothness (local variation in radius lengths) (mean).
mean_compactness
compactness (mean).
mean_concavity
concavity (severity of concave portions of the contour) (mean).
mean_concavepoints
concave points (number of concave portions of the contour) (mean).
mean_symmetry
symmetry (mean).
mean_fractaldim
fractal dimension (mean).
SE_radius
radius (mean of distances from center to points on the perimeter) (SE).
SE_texture
texture (standard deviation of gray-scale values) (SE).
SE_perimeter
perimeter (SE).
SE_area
area (SE).
SE_smoothness
smoothness (local variation in radius lengths) (SE).
SE_compactness
compactness (SE).
SE_concavity
concavity (severity of concave portions of the contour) (SE).
SE_concavepoints
concave points (number of concave portions of the contour) (SE).
SE_symmetry
symmetry (SE).
SE_fractaldim
fractal dimension (SE).
worst_radius
radius (mean of distances from center to points on the perimeter) (worst).
worst_texture
texture (standard deviation of gray-scale values) (worst).
worst_perimeter
perimeter (worst).
worst_area
area (worst).
worst_smoothness
smoothness (local variation in radius lengths) (worst).
worst_compactness
compactness (worst).
worst_concavity
concavity (severity of concave portions of the contour) (worst).
worst_concavepoints
concave points (number of concave portions of the contour) (worst).
worst_symmetry
symmetry (worst).
worst_fractaldim
fractal dimension (worst).
tsize
diameter of the excised tumor in centimeters.
pnodes
number of positive axillary lymph nodes observed at time of surgery.
Details
The first 30 features are computed from a digitized image of a
fine needle aspirate (FNA) of a breast mass. They describe
characteristics of the cell nuclei present in the image.
There are two possible learning problems: predicting status or predicting
the time to recur.
1) Predicting field 2, outcome: R = recurrent, N = non-recurrent
- Dataset should first be filtered to reflect a particular
endpoint; e.g., recurrences before 24 months = positive,
non-recurrence beyond 24 months = negative.
- 86.3
previous version of this data.
2) Predicting Time To Recur (field 3 in recurrent records)
- Estimated mean error 13.9 months using Recurrence Surface
Approximation.
W. Nick Street, Olvi L. Mangasarian and William H. Wolberg (1995).
An inductive learning approach to prognostic prediction.
In A. Prieditis and S. Russell, editors, Proceedings of the
Twelfth International Conference on Machine Learning, pages
522–530, San Francisco, Morgan Kaufmann.
Peter Buehlmann and Torsten Hothorn (2007),
Boosting algorithms: regularization, prediction and model fitting.
Statistical Science, 22(4), 477–505.
Examples
data("wpbc", package = "TH.data")
### fit logistic regression model
coef(glm(status ~ ., data = wpbc[,colnames(wpbc) != "time"],
family = binomial()))