A data frame with 14 categorical variables (8993 observations).
Explanation of the variable names:
1
INCOME
annual income of household
(Personal income if single)
ordinal
2
SEX
sex
nominal
3
MARITAL.STATUS
marital status
nominal
4
AGE
age
ordinal
5
EDUCATION
educational grade
ordinal
6
OCCUPATION
type of work
nominal
7
AREA
how long the interviewed person has lived
in the San Francisco/Oakland/San Jose area
ordinal
8
DUAL.INCOMES
dual incomes (if married)
nominal
9
HOUSEHOLD.SIZE
persons living in the
household
ordinal
10
UNDER18
persons in household under 18
ordinal
11
HOUSEHOLDER
householder status
nominal
12
HOME.TYPE
type of home
nominal
13
ETHNIC.CLASS
ethnic classification
nominal
14
LANGUAGE
language most often spoken at
home
nominal
Details
A total of N=9409 questionnaires containing 502 questions were
filled out by shopping mall customers in the San Francisco Bay area.
The dataset is an extract from this survey. It consists of
14 demographic attributes. The dataset is a mixture of nominal and
ordinal variables with a lot of missing data.
The goal is to predict the Anual Income of Household from the other 13
demographics attributes.