R Graphical Manual

Browse All

Last data update: 2014.03.03

R: The Normal Mixture Distribution

dmixnorm

R Documentation

The Normal Mixture Distribution

Description

Density, distribution function, quantile function, and random generation for a univariate (one-dimensional) distribution composed of a mixture of normal distributions with means equal to mean, standard deviations equal to sd, and mixing proportion of the components equal to pro.

Usage

dmixnorm(x, mean, sd, pro)

pmixnorm(q, mean, sd, pro)

qmixnorm(p, mean, sd, pro, expand = 1)

rmixnorm(n, mean, sd, pro)

Arguments

`x`	Vector of quantiles.
`mean`	Vector of means, one for each component.
`sd`	Vector of standard deviations, one for each component. If a single value is provided, an equal-variance mixture model is implemented. Must be non-negative.
`pro`	Vector of mixing proportions, one for each component. If missing, an equal-proportion model is implemented, with a warning. If proportions do not sum to unity, they are rescaled to do so. Must be non-negative.
`q`	Vector of quantiles.
`p`	Vector of probabilities.
`expand`	Value to expand the range of probabilities for quantile approximation. `Default = 1.0`. See `details` below.
`n`	Number of observations.

Details

These functions use, modify, and wrap around those from the mclust package, especially dens, sim, and quantileMclust. Functions are slightly faster than the corresponding mclust functions when used with univariate distributions.

Unlike mclust, which primarily focuses on parameter estimation based on mixture samples, the functions here are modified to calculate PDFs, CDFs, approximate quantiles, and random numbers for mixture distributions with user-specified parameters. Because of these modifications, any number of mixture components can be specified (unlike mclust which limits parameter estimation to a maximum of 9 mixture components). The functions are written to emulate the syntax of other R distribution functions (e.g., dnorm).

The number of mixture components (argument G in mclust) is specified from the length of the mean vector. If a single sd value is provided, an equal-variance mixture model (modelNames="E" in mclust) is implemented; if multiple values are provided, a variable-variance model (modelNames="V" in mclust) is implemented. If mixing proportion pro is missing, all components are assigned equal mixing proportions, with a warning. Mixing proportions are rescaled to sum to unity. If the lengths of supplied means, standard deviations, and mixing proportions conflict, an error is called.

Analytical solutions are not available to calculate a quantile function for all combinations of mixture parameters. qmixnorm approximates the quantile function using a spline function calculated from cumulative density functions for the specified mixture distribution. Quantile values for probabilities near zero and one are approximated by taking a randomly generated sample (with sample size equal to the product of 1000 and the number of mixture components), and expanding that range positively and negatively by a multiple (specified by (default) expand = 1) of the observed range in the random sample. In cases where the distribution range is large (such as when mixture components are discrete or there are large distances between components), resulting extreme probability values will be very close to zero or one and can result in non-calculable (NaN) quantiles (and a warning). Use of other expand values (especially expand < 1.0 that expand the ranges by smaller multiples) often will yield improved approximations. Note that expand values equal to or close to 0 may result in inaccurate approximation of extreme quantiles. In situations requiring extreme quantile values, it is recommended that the largest expand value that does not result in a non-calculable quantile (i.e., no warning called) be used. See examples for confirmation that approximations are accurate, comparing the approximate quantiles from a single 'mixture' distribution to those calculated for the same distribution using qnorm, and demonstrating cases in which using non-default expand values will allow correct approximation of quantiles.

Value

dmixnorm gives the density, pmixnorm gives the distribution function, qmixnorm approximates the quantile function, and rmixnorm generates random numbers.

Author(s)

Phil Novack-Gottshall pnovack-gottshall@ben.edu and Steve Wang scwang@swarthmore.edu, based on functions written by Luca Scrucca.

Examples

# Mixture of two normal distributions
mean <- c(3, 6)
pro <- c(.25, .75)
sd <- c(.5, 1)
x <- rmixnorm(n=5000, mean=mean, pro=pro, sd=sd)
hist(x, n=20, main="random bimodal sample")

## Not run: 
# Requires functions from the 'mclust' package
require(mclust)
# Confirm 'rmixnorm' above produced specified model
mod <- mclust::Mclust(x)
mod             # Best model (correctly) has two-components with unequal variances
mod$parameters	# and approximately same parameters as specified above
sd^2            # Note reports var (sigma-squared) instead of sd used above

## End(Not run)

# Density, distribution, and quantile functions
plot(seq(0, 10, .1), dmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro),
     type="l", main="Normal mixture density")
plot(seq(0, 10, .1), pmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro),
     type="l", main="Normal mixture cumulative")
plot(stats::ppoints(100), qmixnorm(stats::ppoints(100), mean=mean, sd=sd, pro=pro),
     type="l", main="Normal mixture quantile")

# Any number of mixture components are allowed
plot(seq(0, 50, .01), pmixnorm(seq(0, 50, .01), mean=1:50, sd=.05, pro=rep(1, 50)),
     type="l", main="50-component normal mixture cumulative")

# 'expand' can be specified to prevent non-calculable quantiles:
q1 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1))
q1 # Calls a warning because of NaNs
# Reduce 'expand'. (Values < 0.8 allow correct approximation)
q2 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1), expand=.5)
plot(stats::ppoints(30), q2, type="l", main="Quantile with reduced range")

## Not run: 
# Requires functions from the 'mclust' package
# Confirmation that qmixnorm approximates correct solution
#   (single component 'mixture' should mimic qnorm):
x <- rmixnorm(n=5000, mean=0, pro=1, sd=1)
mpar <- mclust::Mclust(x)$param
approx <- qmixnorm(p=ppoints(100), mean=mpar$mean, pro=mpar$pro,
     sd=sqrt(mpar$variance$sigmasq))
known <- qnorm(p=ppoints(100), mean=mpar$mean, sd=sqrt(mpar$variance$sigmasq))
cor(approx, known)  # Approximately the same
plot(approx, main="Quantiles for (unimodal) normal")
lines(known)
legend("topleft", legend=c("known", "approximation"), pch=c(NA,1),
     lty=c(1, NA), bty="n")

## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(KScorrect)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/KScorrect/dmixnorm.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dmixnorm
> ### Title: The Normal Mixture Distribution
> ### Aliases: dmixnorm pmixnorm qmixnorm rmixnorm
> 
> ### ** Examples
> 
> # Mixture of two normal distributions
> mean <- c(3, 6)
> pro <- c(.25, .75)
> sd <- c(.5, 1)
> x <- rmixnorm(n=5000, mean=mean, pro=pro, sd=sd)
> hist(x, n=20, main="random bimodal sample")
> 
> ## Not run: 
> ##D # Requires functions from the 'mclust' package
> ##D require(mclust)
> ##D # Confirm 'rmixnorm' above produced specified model
> ##D mod <- mclust::Mclust(x)
> ##D mod             # Best model (correctly) has two-components with unequal variances
> ##D mod$parameters	# and approximately same parameters as specified above
> ##D sd^2            # Note reports var (sigma-squared) instead of sd used above
> ## End(Not run)
> 
> # Density, distribution, and quantile functions
> plot(seq(0, 10, .1), dmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro),
+      type="l", main="Normal mixture density")
> plot(seq(0, 10, .1), pmixnorm(seq(0, 10, .1), mean=mean, sd=sd, pro=pro),
+      type="l", main="Normal mixture cumulative")
> plot(stats::ppoints(100), qmixnorm(stats::ppoints(100), mean=mean, sd=sd, pro=pro),
+      type="l", main="Normal mixture quantile")
> 
> # Any number of mixture components are allowed
> plot(seq(0, 50, .01), pmixnorm(seq(0, 50, .01), mean=1:50, sd=.05, pro=rep(1, 50)),
+      type="l", main="50-component normal mixture cumulative")
Warning message:
In pmixnorm(seq(0, 50, 0.01), mean = 1:50, sd = 0.05, pro = rep(1,  :
  'equal variance model' implemented. If want 'variable-variance model', specify remaining 'sd's.
> 
> # 'expand' can be specified to prevent non-calculable quantiles:
> q1 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1))
Warning message:
In qmixnorm(stats::ppoints(30), mean = c(1, 20), sd = c(1, 1), pro = c(1,  :
  Some quantile values could not be calculated. If all 'p's are within [0,1], try reducing the value of 'expand' and try again.
> q1 # Calls a warning because of NaNs
 [1] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
[20] NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
> # Reduce 'expand'. (Values < 0.8 allow correct approximation)
> q2 <- qmixnorm(stats::ppoints(30), mean=c(1, 20), sd=c(1, 1), pro=c(1, 1), expand=.5)
> plot(stats::ppoints(30), q2, type="l", main="Quantile with reduced range")
> 
> ## Not run: 
> ##D # Requires functions from the 'mclust' package
> ##D # Confirmation that qmixnorm approximates correct solution
> ##D #   (single component 'mixture' should mimic qnorm):
> ##D x <- rmixnorm(n=5000, mean=0, pro=1, sd=1)
> ##D mpar <- mclust::Mclust(x)$param
> ##D approx <- qmixnorm(p=ppoints(100), mean=mpar$mean, pro=mpar$pro,
> ##D      sd=sqrt(mpar$variance$sigmasq))
> ##D known <- qnorm(p=ppoints(100), mean=mpar$mean, sd=sqrt(mpar$variance$sigmasq))
> ##D cor(approx, known)  # Approximately the same
> ##D plot(approx, main="Quantiles for (unimodal) normal")
> ##D lines(known)
> ##D legend("topleft", legend=c("known", "approximation"), pch=c(NA,1),
> ##D      lty=c(1, NA), bty="n")
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>