Last data update: 2014.03.03

R: Cross-entropy criterion from snmf runs
cross.entropyR Documentation

Cross-entropy criterion from snmf runs

Description

Return the cross-entropy criterion for the chosen runs with K ancestral populations. For an example, see snmf. The cross-entropy criterion is a value based on the prediction of masked genotypes to evaluate the error of ancestry estimation. The criterion will help to choose the best number of ancestral population (K) and the best run among a set of runs in snmf. A smaller value of cross-entropy means a better run in terms of prediction capacity. The cross-entropy criterion can be automatically calculated by the snmf function with the entropy option.

Usage

cross.entropy(object, K, run)

Arguments

object

A snmfProject object.

K

The number of ancestral populations.

run

A list of chosen run number.

Value

res

A list containing the cross-entropy criterion for the chosen runs with K ancestral populations.

Author(s)

Eric Frichot

See Also

geno snmf G Q

Examples

### Example of analyses using snmf ###

# creation of the genotype file, genotypes.geno.
# It contains 400 SNPs for 50 individuals.
data("tutorial")
write.geno(tutorial.R, "genotypes.geno")

################
# runs of snmf #
################

# main options, K: (the number of ancestral populations), 
#        entropy: calculate the cross-entropy criterion, 
#        CPU: the number of CPUs.

# Runs with K = 3 with cross-entropy and 2 repetitions.
project = NULL
project = snmf("genotypes.geno", K = 3, entropy = TRUE, repetitions = 2, 
    project = "new")

# get the cross-entropy for all runs for K = 3 
ce = cross.entropy(project, K = 3)

# get the cross-entropy for the 2nd run for K = 3
ce = cross.entropy(project, K = 3, run = 2)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(LEA)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/LEA/crossEntropy.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cross.entropy
> ### Title: Cross-entropy criterion from snmf runs
> ### Aliases: cross.entropy
> ### Keywords: snmf
> 
> ### ** Examples
> 
> ### Example of analyses using snmf ###
> 
> # creation of the genotype file, genotypes.geno.
> # It contains 400 SNPs for 50 individuals.
> data("tutorial")
> write.geno(tutorial.R, "genotypes.geno")
[1] "genotypes.geno"
> 
> ################
> # runs of snmf #
> ################
> 
> # main options, K: (the number of ancestral populations), 
> #        entropy: calculate the cross-entropy criterion, 
> #        CPU: the number of CPUs.
> 
> # Runs with K = 3 with cross-entropy and 2 repetitions.
> project = NULL
> project = snmf("genotypes.geno", K = 3, entropy = TRUE, repetitions = 2, 
+     project = "new")
The project is saved into :
 genotypes.snmfProject 

To load the project, use:
 project = load.snmfProject("genotypes.snmfProject")

To remove the project, use:
 remove.snmfProject("genotypes.snmfProject")

[1] 239882686
[1] "*************************************"
[1] "*          create.dataset            *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)                 50
        -L (number of loci)                        400
        -s (seed random init)                      239882686
        -r (percentage of masked data)             0.05
        -x (genotype file in .geno format)         /home/ddbj/DataUpdator-rgm3/target/genotypes.geno
        -o (output file in .geno format)           /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno

 Write genotype file with masked data, /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno:		OK.

[1] "*************************************"
[1] "* sNMF K = 3  repetition 1      *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)             50
        -L (number of loci)                    400
        -K (number of ancestral pops)          3
        -x (input file)                        /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno
        -q (individual admixture file)         /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run1/genotypes_r1.3.Q
        -g (ancestral frequencies file)        /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run1/genotypes_r1.3.G
        -i (number max of iterations)          200
        -a (regularization parameter)          10
        -s (seed random init)                  239882686
        -e (tolerance error)                   1E-05
        -p (number of processes)               1
        - diploid

Read genotype file /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno:		OK.


Main algorithm:
	[                                                                           ]
	[======================================]
Number of iterations: 101

Least-square error: 5739.745026
Write individual ancestry coefficient file /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run1/genotypes_r1.3.Q:		OK.
Write ancestral allele frequency coefficient file /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run1/genotypes_r1.3.G:	OK.

[1] "*************************************"
[1] "*    cross-entropy estimation       *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)         50
        -L (number of loci)                400
        -K (number of ancestral pops)      3
        -x (genotype file)                 /home/ddbj/DataUpdator-rgm3/target/genotypes.geno
        -q (individual admixture)          /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run1/genotypes_r1.3.Q
        -g (ancestral frequencies)         /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run1/genotypes_r1.3.G
        -i (with masked genotypes)         /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno
        - diploid

Cross-Entropy (all data):	 0.479597
Cross-Entropy (masked data):	 0.5813
The project is saved into :
 genotypes.snmfProject 

To load the project, use:
 project = load.snmfProject("genotypes.snmfProject")

To remove the project, use:
 remove.snmfProject("genotypes.snmfProject")

[1] 1931795865
[1] "*************************************"
[1] "*          create.dataset            *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)                 50
        -L (number of loci)                        400
        -s (seed random init)                      1931795865
        -r (percentage of masked data)             0.05
        -x (genotype file in .geno format)         /home/ddbj/DataUpdator-rgm3/target/genotypes.geno
        -o (output file in .geno format)           /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno

 Write genotype file with masked data, /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno:		OK.

[1] "*************************************"
[1] "* sNMF K = 3  repetition 2      *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)             50
        -L (number of loci)                    400
        -K (number of ancestral pops)          3
        -x (input file)                        /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno
        -q (individual admixture file)         /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run2/genotypes_r2.3.Q
        -g (ancestral frequencies file)        /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run2/genotypes_r2.3.G
        -i (number max of iterations)          200
        -a (regularization parameter)          10
        -s (seed random init)                  1931795865
        -e (tolerance error)                   1E-05
        -p (number of processes)               1
        - diploid

Read genotype file /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno:		OK.


Main algorithm:
	[                                                                           ]
	[========================]
Number of iterations: 63

Least-square error: 5710.071614
Write individual ancestry coefficient file /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run2/genotypes_r2.3.Q:		OK.
Write ancestral allele frequency coefficient file /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run2/genotypes_r2.3.G:	OK.

[1] "*************************************"
[1] "*    cross-entropy estimation       *"
[1] "*************************************"
summary of the options:

        -n (number of individuals)         50
        -L (number of loci)                400
        -K (number of ancestral pops)      3
        -x (genotype file)                 /home/ddbj/DataUpdator-rgm3/target/genotypes.geno
        -q (individual admixture)          /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run2/genotypes_r2.3.Q
        -g (ancestral frequencies)         /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/K3/run2/genotypes_r2.3.G
        -i (with masked genotypes)         /home/ddbj/DataUpdator-rgm3/target/genotypes.snmf/masked/genotypes_I.geno
        - diploid

Cross-Entropy (all data):	 0.479197
Cross-Entropy (masked data):	 0.595736
The project is saved into :
 genotypes.snmfProject 

To load the project, use:
 project = load.snmfProject("genotypes.snmfProject")

To remove the project, use:
 remove.snmfProject("genotypes.snmfProject")

> 
> # get the cross-entropy for all runs for K = 3 
> ce = cross.entropy(project, K = 3)
> 
> # get the cross-entropy for the 2nd run for K = 3
> ce = cross.entropy(project, K = 3, run = 2)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>