Last data update: 2014.03.03

R: Adequacy of models
goodness.fitR Documentation

Adequacy of models

Description

This function provides some useful statistics to assess the quality of fit of probabilistic models, including the statistics Cramér-von Mises and Anderson-Darling. These statistics are often used to compare models not fitted. You can also calculate other goodness of fit such as AIC, CAIC, BIC, HQIC and Kolmogorov-Smirnov test.

Usage

goodness.fit(pdf, cdf, starts, data, method = "PSO", domain = c(0,Inf),
             mle = NULL,...)

Arguments

pdf

Probability density function;

cdf

Cumulative distribution function;

starts

Initial parameters to maximize the likelihood function;

data

Data vector;

method

Method used for minimization of the function -log(likelihood). The methods supported are: PSO (default), BFGS, Nelder-Mead, SANN, CG. Can also be transmitted only the first letter of the methodology, i.e., P, B, N, S or C respectively;

domain

Domain of probability density function. By default the domain of probability density function is the open interval 0 to infinity.This option must be an vector with two values;

mle

Vector with the estimation maximum likelihood. This option should be used if you already have knowledge of the maximum likelihood estimates. The default is NULL, ie, the function will try to obtain the estimates of maximum likelihoods;

...

If method = "PSO" or method = "P", inform the arguments of the pso function. Get details about these arguments into pso. Basically the arguments that should be provided are the vectors lim_inf and lim_sup. The other parameters of the pso function can be informed in the desired configuration. However, may be omitted if the default configuration is sufficient.

Details

The function goodness.fit returns statistics KS (Kolmogorov-Smirnov), A (Anderson-Darling), W (Cramér-von Misses). Are also calculated other measures of goodness of fit. These functions are: AIC (Akaike Information Criterion), CAIC (Consistent Akaikes Information Criterion), BIC (Bayesian Information Criterion) and HQIC (Hannan-Quinn information criterion).

The Kolmogorov-Smirnov test may return NA with a certain frequency. The return NA informs that the statistical KS is not reliable for the data set used. More details about this issue can be obtained from ks.test.

By default, the function calculates the maximum likelihood estimates. The errors of the estimates are also calculated. In cases that the function can not obtain the maximum likelihood estimates, the change of the values initial, in some cases, resolve the problem. You can also enter with the maximum likelihood estimation if there is already prior knowledge.

Value

W

Statistic Cramér-von Misses;

A

Statistic Anderson Darling;

KS

Kolmogorov Smirnov test;

mle

Maximum likelihood estimates;

AIC

Akaike Information Criterion;

CAIC

Consistent Akaikes Information Criterion;

BIC

Bayesian Information Criterion;

HQIC

Hannan-Quinn information criterion;

Erro

Standard errors of the maximum likelihood estimates;

Value

Minimum value of the function -log(likelihood);

Convergence

0 indicates successful completion and 1 indicates that the iteration limit maxit had been reached. More details at optim.

Note

It is not necessary to define the likelihood function or log-likelihood. You only need to define the probability density function and distribution function.

Author(s)

Pedro Rafael Diniz Marinho pedro.rafael.marinho@gmail.com

References

Chen, G., Balakrishnan, N. (1995). A general purpose approximate goodness-of-fit test. Journal of Quality Technology, 27, 154-161.

Hannan, E. J. and Quinn, B. G. (1979). The Determination of the Order of an Autoregression. Journal of the Royal Statistical Society, Series B, 41, 190-195.

Nocedal, J. and Wright, S. J. (1999) Numerical Optimization. Springer.

Sakamoto, Y., Ishiguro, M. and Kitagawa G. (1986). Akaike Information Criterion Statistics. D. Reidel Publishing Company.

See Also

For details about the optimization methodologies may view the functions pso and optim.

Examples


# Example 1:

data(carbone)

# Exponentiated Weibull - Probability density function.
pdf_expweibull <- function(par,x){
  beta = par[1]
  c = par[2]
  a = par[3]
  a * beta * c * exp(-(beta*x)^c) * (beta*x)^(c-1) * (1 - exp(-(beta*x)^c))^(a-1)
}

# Exponentiated Weibull - Cumulative distribution function.
cdf_expweibull <- function(par,x){
  beta = par[1]
  c = par[2]
  a = par[3]
  (1 - exp(-(beta*x)^c))^a
}

set.seed(0)
result_1 = goodness.fit(pdf = pdf_expweibull, cdf = cdf_expweibull, 
                        starts = c(1,1,1), data = carbone, method = "PSO",
                        domain = c(0,Inf),mle = NULL, lim_inf = c(0,0,0),
                        lim_sup = c(2,2,2), S = 250, prop=0.1, N=50)
             
x = seq(0, 6, length.out = 500)
hist(carbone, probability = TRUE)
lines(x, pdf_expweibull(x, par = result_1$mle), col = "blue")

# Example 2:

pdf_weibull <- function(par,x){
  a = par[1]
  b = par[2]
  dweibull(x, shape = a, scale = b)
}

cdf_weibull <- function(par,x){
  a = par[1]
  b = par[2]
  pweibull(x, shape = a, scale = b)
}

set.seed(0)
random_data2 = rweibull(250,2,2)
result_2 = goodness.fit(pdf = pdf_weibull, cdf = cdf_weibull, starts = c(1,1), data = random_data2,
             method = "PSO", domain = c(0,Inf), mle = NULL, lim_inf = c(0,0), lim_sup = c(10,10),
             N = 100, S = 250)

x = seq(0,ceiling(max(random_data2)), length.out = 500)
hist(random_data2, probability = TRUE)
lines(x, pdf_weibull(par = result_2$mle, x), col = "blue")

# TO RUN THE CODE BELOW, UNCOMMENT THE CODES.

# Example 3:

# Kumaraswamy Beta - Probability density function.
#pdf_kwbeta <- function(par,x){
#  beta = par[1]
#  a = par[2]
#  alpha = par[3]
#  b = par[4]
#  (a*b*x^(alpha-1)*(1-x)^(beta-1)*(pbeta(x,alpha,beta))^(a-1)*
#  (1-pbeta(x,alpha,beta)^a)^(b-1))/beta(alpha,beta) 
#}

# Kumaraswamy Beta - Cumulative distribution function.
#cdf_kwbeta <- function(par,x){
#  beta = par[1]
#  a = par[2]
#  alpha = par[3]
#  b = par[4]
#  1 - (1 - pbeta(x,alpha,beta)^a)^b
#}

#set.seed(0)
#random_data3 = rbeta(150,2,2.2)

#system.time(goodness.fit(pdf = pdf_kwbeta, cdf = cdf_kwbeta, starts = c(1,1,1,1),
#              data = random_data3, method = "PSO", domain = c(0,1), lim_inf = c(0,0,0,0),
#              lim_sup = c(10,10,10,10), S = 200, prop = 0.1, N = 40))


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(AdequacyModel)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/AdequacyModel/goodness.fit.Rd_%03d_medium.png", width=480, height=480)
> ### Name: goodness.fit
> ### Title: Adequacy of models
> ### Aliases: goodness.fit
> ### Keywords: AdequacyModel pso AIC BIC CAIC distribution survival
> 
> ### ** Examples
> 
> 
> # Example 1:
> 
> data(carbone)
> 
> # Exponentiated Weibull - Probability density function.
> pdf_expweibull <- function(par,x){
+   beta = par[1]
+   c = par[2]
+   a = par[3]
+   a * beta * c * exp(-(beta*x)^c) * (beta*x)^(c-1) * (1 - exp(-(beta*x)^c))^(a-1)
+ }
> 
> # Exponentiated Weibull - Cumulative distribution function.
> cdf_expweibull <- function(par,x){
+   beta = par[1]
+   c = par[2]
+   a = par[3]
+   (1 - exp(-(beta*x)^c))^a
+ }
> 
> set.seed(0)
> result_1 = goodness.fit(pdf = pdf_expweibull, cdf = cdf_expweibull, 
+                         starts = c(1,1,1), data = carbone, method = "PSO",
+                         domain = c(0,Inf),mle = NULL, lim_inf = c(0,0,0),
+                         lim_sup = c(2,2,2), S = 250, prop=0.1, N=50)
There were 35 warnings (use warnings() to see them)
>              
> x = seq(0, 6, length.out = 500)
> hist(carbone, probability = TRUE)
> lines(x, pdf_expweibull(x, par = result_1$mle), col = "blue")
> 
> # Example 2:
> 
> pdf_weibull <- function(par,x){
+   a = par[1]
+   b = par[2]
+   dweibull(x, shape = a, scale = b)
+ }
> 
> cdf_weibull <- function(par,x){
+   a = par[1]
+   b = par[2]
+   pweibull(x, shape = a, scale = b)
+ }
> 
> set.seed(0)
> random_data2 = rweibull(250,2,2)
> result_2 = goodness.fit(pdf = pdf_weibull, cdf = cdf_weibull, starts = c(1,1), data = random_data2,
+              method = "PSO", domain = c(0,Inf), mle = NULL, lim_inf = c(0,0), lim_sup = c(10,10),
+              N = 100, S = 250)
There were 50 or more warnings (use warnings() to see the first 50)
> 
> x = seq(0,ceiling(max(random_data2)), length.out = 500)
> hist(random_data2, probability = TRUE)
> lines(x, pdf_weibull(par = result_2$mle, x), col = "blue")
> 
> # TO RUN THE CODE BELOW, UNCOMMENT THE CODES.
> 
> # Example 3:
> 
> # Kumaraswamy Beta - Probability density function.
> #pdf_kwbeta <- function(par,x){
> #  beta = par[1]
> #  a = par[2]
> #  alpha = par[3]
> #  b = par[4]
> #  (a*b*x^(alpha-1)*(1-x)^(beta-1)*(pbeta(x,alpha,beta))^(a-1)*
> #  (1-pbeta(x,alpha,beta)^a)^(b-1))/beta(alpha,beta) 
> #}
> 
> # Kumaraswamy Beta - Cumulative distribution function.
> #cdf_kwbeta <- function(par,x){
> #  beta = par[1]
> #  a = par[2]
> #  alpha = par[3]
> #  b = par[4]
> #  1 - (1 - pbeta(x,alpha,beta)^a)^b
> #}
> 
> #set.seed(0)
> #random_data3 = rbeta(150,2,2.2)
> 
> #system.time(goodness.fit(pdf = pdf_kwbeta, cdf = cdf_kwbeta, starts = c(1,1,1,1),
> #              data = random_data3, method = "PSO", domain = c(0,1), lim_inf = c(0,0,0,0),
> #              lim_sup = c(10,10,10,10), S = 200, prop = 0.1, N = 40))
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>