Box and Cox (1964) considered the following family
of transformations indexed by lambda:
w = (y^lambda-1)/lambda
= expm1(lambda*log(y))/lambda,
with the lambda=0 case defined as
log(y) to make w continuous in
lambda for constant y.
They estimate lambda assuming w
follows a normal distribution. This raises a
theoretical problem in that y must be
positive, which means that w must follow
a truncated normal distribution conditioned on
lambda*w > (-1).
Bickel and Doksum (1981) removed the restriction
to positive y, i.e., to
w > (-1/lambda) by modifying
the transformaton as follows:
w =
(sgn(y)*abs(y)^lambda-1)/lambda
if lambda != 0 and
sgn(y)*log(abs(y)) if lambda = 0,
where sgn(y) = 1 if y >= 0 and -1 otherwise.
NOTE: sgn(y) is different from
link[base]{sign}(y), which is 0 for y = 0.
A two-argument update to the sign function
in the base package has been added to
this Ecfun package, so sign(y, 1)
= sgn(y).
If (y<0), this transformation is discontinuous
at lambda = 0. To see this, we rewrite
this as
w =
(sgn(y)*expm1(lambda*log(abs(y))) +
(sgn(y)-1)) / lambda
where
O(lambda)
indicates a term that is dominated by a constant
times lambda.
If y<0, this latter term (sgn(y)-1)/lambda =
(-2)/lambda and becomes Inf as lambda -> 0.
In practice, we assume that y > 0, so this
distinction has little practical value. However,
the BoxCox function computes the
Bickel-Doksum version.
Box and Cox further noted that proper estimation
of lambda should include the Jacobian of
the transformation in the log(likelihood).
Doing this can be achieved by rescaling the
transformation with the nth root of
the Jacobian, which can be written as follows:
With this the rescaled power transformation is
as follows:
z = (y^lambda-1)/(lambda*j(y, lambda) if lambda!=0
or GeometricMean(y)*log(y) if lambda==0.
In addition to facilitating estimation of
lambda, rescaling has the advantage that
the units of z are the same as the units
of y.
The ouput has class 'BoxCox', which has
attributes that allow the input to be recovered
using invBoxCox. The default values of
the arguments of invBoxCox are provided
by the corresponding attributes
of z.
a numeric vector for which the power transform
is desired
lambda
A numeric vector of length 1 or 2. The first
component is the power. If the second
component is provided, y is replaced
by y+lambda[2].
rescale
logical or numeric. If logical:
For BoxCox, this is TRUE to
return the power transform with rescale,
z, above, and FALSE
to return the power transform without the
nth root of the Jacobian, w,
above. This defaults to TRUE, because
this will give z the same units as
y.
For invBoxCox, this is TRUE
if the input argument z is assumed to
have been rescaled by the nth root of
the Jacobian of the transformation. This
defaults to a rescale attribute of
z if present or to TRUE if
absent.
If numeric, it is assumed to be the geometric
mean of another set of y values to use with
new y's.
na.rm
logical:
TRUE to remove NAs from y
before computing the geometric mean.
FALSE to compute NA
for the geometric mean if any(is.na(y)).
NOTE: If na.rm = FALSE, the
output will be all NA if
rescale = TRUE. This could
produce non useable answers in most cases.
To avoid that, the default for na.rm
is TRUE whenever rescale =
TRUE. Conversely, applications using
na.rm = FALSE will likely also
want rescale = FALSE to avoid
returning a non-answer in these cases. This
explains the default na.rm =
rescale.
z
a numeric vector or an object of class
BoxCox for which the inverse Box-Cox
transform is desired.
sign.y
an optional logical vector giving
sign(y-lambda[2]) of the data values
that presumably generated z. Defaults
to an sign.y attribute of z or
to rep(1, length(z)) if no such
attribute is present.
GeometricMean
an optional numeric scalar giving the
geometric mean of the data values that
presumably generated z. Defaults to
a GeometricMean attribute of
z or to 1 if no such attribute is
present.
Details
Box and Cox (1964) discussed
w(y, lambda) = (y^lambda - 1)/lambda.
They noted that w is continuous in
lambda with w(y, lambda) = log(y)
if lambda = 0 (by l'Hopital's rule).
They noted that proper estimation of
lambda should include the Jacobian of
w(y, lambda) with the likelihood. They further
showed that a naive normal likelihood using
z(y, lambda) as the response without a
Jacobian is equivalent to the normal likelihood
using w(y, lambda) adjusted appropriately
using the Jacobian. See Box and Cox (1964) or
the Wikipedia article on "Power transform".
Bickel and Doksum (1981) suggested adding
sign(y) to the transformation, as
discussed above.
NUMERICAL ANALYSIS:
Consider the Bickel and Doksum version described
above:
w <- (sign(y)*abs(y)^lambda-1)/lambda
if(any(y==0)), GeometricMean(y) = 0. This
creates a problem with the above math.
For y>0, the last term is zero.
boxcox ignores cases with
y<=0 and uses this formula (ignoring the
final O(la*ly)) whenever abs(la) <= eps = 1/50.
That form is used here also.
For invBoxCox a complementary analysis is
as follows:
abs(y+lambda[2]) = abs(1+la*w)^(1/la)
= exp(log1p(la*w)/la) for abs(la*w)<1
= w*(1-la*w*((1/2)-la*w*((1/3)-la*w*(1/4-...))))
Value
BoxCox returns an object of class
BoxCox, being a numeric vector of the
same length as y with the following
optional attributes:
lambdathe value of lambda used in
the transformation
sign.y
sign(y) (or sign(y-lambda[2]) lambda[2] is
provided and if any of these quantities are
negative. Otherwise, this is omitted and all
are assumed to be positive.
rescale
logical:
TRUE if z(y, lambda)
is returned rescaled by g^(lambda-1)
with g = the geometric mean of y
and FALSE if z(y, lambda) is not
so rescaled.
GeometricMean
If rescale is numeric,
attr(., 'GeometricMean') <- rescale.
Otherwise, attr(., 'GeometricMean') is
the Geometric mean of abs(y) =
exp(mean(log(abs(y)))) or of
abs(y+lambda[2]) if(length(lambda)>1).
invBoxCox returns a numeric vector,
reconstructing y from
BoxCox(y, ...).
Source
Bickel, Peter J., and Doksum, Kjell A. (1981) "An
analysis of transformation revisited",
Journal of the American Statistical Association,
76 (374): 296-311
Box, George E. P.; Cox, D. R. (1964). "An analysis
of transformations",
Journal of the Royal Statistical Society,
Series B 26 (2): 211-252.
Box, George E. P.; Cox, D. R. (1982). "An analysis
of transformations revisited, rebutted",
Journal of the American Statistical Association,
77(377): 209-210.