Last data update: 2014.03.03

R: State of the Union Data Set
USAR Documentation

State of the Union Data Set

Description

This dataset consists of the spoken, not written, addresses from 1900 until the sixth address by Barack Obama in 2014. Punctuation characters, numbers, words shorter than three characters, and stop-words (e.g., "that", "and", and "which") were removed from the dataset. This resulted in a dataset of 86 speeches containing 834 different meaningful words each. Term frequency-inverse document frequency (TF-IDF) was used to get the feature vectors. It is often used as a weighting factor in information retrieval and text mining. The TF-IDF value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.

Usage

data(USA)

Value

A list with the following elements:

data

Gene expression data. A matrix with 86 rows and 834 columns.

year

Year index. A vector with 86 elements.

president

President index. A vector with 86 elements.

References

Cacciatore S, Luchinat C, Tenori L.
Knowledge discovery by accuracy maximization.
Proc Natl Acad Sci U S A 2014;111(14):5117-22.

Examples

# Here is reported the analysis on the State of the Union 
# of USA president as shown in Cacciatore, et al. (2014)
# WARNING: This example is high computational extensive
#
# data(USA)
# kk=KODAMA(USA$data)
# cc=cmdscale(kk$dissimilarity)
# par(cex=0.5,mar=c(15,6,2,2));
# plot(USA$year,cc[,1],axes=F,pch=20,xlab="",ylab="First Component");
# axis(1,at=USA$year,labels=rownames(USA$data),las=2);
# axis(2,las=2);
# box()


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(KODAMA)
Loading required package: e1071
Loading required package: plsgenomics
Loading required package: MASS
Loading required package: boot
Loading required package: parallel
Loading required package: class

Attaching package: 'KODAMA'

The following object is masked from 'package:plsgenomics':

    transformy

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/KODAMA/USA.Rd_%03d_medium.png", width=480, height=480)
> ### Name: USA
> ### Title: State of the Union Data Set
> ### Aliases: USA
> ### Keywords: datasets
> 
> ### ** Examples
> 
> # Here is reported the analysis on the State of the Union 
> # of USA president as shown in Cacciatore, et al. (2014)
> # WARNING: This example is high computational extensive
> #
> # data(USA)
> # kk=KODAMA(USA$data)
> # cc=cmdscale(kk$dissimilarity)
> # par(cex=0.5,mar=c(15,6,2,2));
> # plot(USA$year,cc[,1],axes=F,pch=20,xlab="",ylab="First Component");
> # axis(1,at=USA$year,labels=rownames(USA$data),las=2);
> # axis(2,las=2);
> # box()
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>