R: Builds the "strata" dataframe containing information on...
buildStrataDF
R Documentation
Builds the "strata" dataframe containing information on target variables Y's
distributions in the different strata, starting from sample data or from a frame
Description
This function allows to build the information regarding strata in the population required as
an input by the algorithm of Bethel for the optimal allocation.
In order to estimate means and standard deviations for target variables Y's, we need data coming from:
(1) a previous round of the survey whose sample we want to plan;
(2) sample data from a survey with variables that are proxy to the ones we are interested to;
(3) a frame containing values of Y's variables (or proxy variables) for all the population.
In all cases, each unit in the dataset must contain auxiliary information (X's variables)
and also target variables Y's (or proxy variables) values: under these conditions it is possible
to build the dataframe "strata", containing information on the distribution of Y's in the different strata
(namely, means and standard deviations), together with information on strata (total population,
if it is to be censused or not, the cost per single interview).
If the information is contained in a sample dataset, a variable named WEIGHT is expected to be
present. In case of a frame, no such variable is given, and the function will define a WEIGHT variable
for each unit, whose value is always '1'.
Missing values for each Y variable will not be taken into account in the computation of means and standard
deviations (in any case, NA's can be present in the dataset).
The dataframe "strata" is written to an external file (tab delimited, extension "txt"), and will be
used as an input by optimizeStrata.
Usage
buildStrataDF(dataset)
Arguments
dataset
This is the name of the dataframe containing the sampling data, or frame data.
It is strictly required that auxiliary information is organised in variables named
as X1, X2, ... , Xm (there should be at least one of them) and the target variables
are denoted by Y1, Y2, ... , Yn.
In addition, in case of sample data, a variable named 'WEIGHT' must be present in the dataframe,
containing the weigths associated to each sampling unit
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(SamplingStrata)
Loading required package: memoise
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SamplingStrata/buildStrataDF.Rd_%03d_medium.png", width=480, height=480)
> ### Name: buildStrataDF
> ### Title: Builds the "strata" dataframe containing information on target
> ### variables Y's distributions in the different strata, starting from
> ### sample data or from a frame
> ### Aliases: buildStrataDF
> ### Keywords: survey
>
> ### ** Examples
>
> data(swissframe)
> strata <- buildStrataDF(swissframe)
Computations have been done on population data
> head(strata)
STRATO N M1 M2 M3 M4 S1 S2
1 1*1*1*1*1*1 184 48.31522 49.40217 61.44022 28.40761 26.81536 28.49831
2 1*1*1*1*1*2 1 98.00000 106.00000 116.00000 43.00000 0.00000 0.00000
3 1*1*1*2*1*1 2 57.00000 64.00000 70.00000 50.00000 4.00000 0.00000
4 1*1*2*1*1*1 11 77.72727 81.18182 92.36364 47.00000 15.24998 18.69768
5 1*2*1*1*1*1 9 58.22222 61.55556 66.77778 36.22222 25.46360 20.27100
6 1*2*1*2*1*1 8 61.00000 68.00000 84.62500 58.37500 24.56624 19.48076
S3 S4 COST CENS DOM1 X1 X2 X3 X4 X5 X6
1 32.63062 14.63922 1 0 1 1 1 1 1 1 1
2 0.00000 0.00000 1 0 1 1 1 1 1 1 2
3 1.00000 15.00000 1 0 1 1 1 1 2 1 1
4 17.03084 11.12736 1 0 1 1 1 2 1 1 1
5 24.89881 15.49751 1 0 1 1 2 1 1 1 1
6 26.35307 26.55625 1 0 1 1 2 1 2 1 1
>
>
>
>
>
> dev.off()
null device
1
>