R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Random sample of clustered data

computeClusterSample

R Documentation

Random sample of clustered data

Description

Random sample of clustered data

Usage

computeClusterSample(channel, km, sampleFraction, sampleSize, scaled = FALSE,
  includeId = FALSE, test = FALSE)

Arguments

`channel`	connection object as returned by `odbcConnect`.
`km`	an object of class `"toakmeans"` obtained with `computeKmeans`.
`sampleFraction`	one or more sample fractions to use in the sampling of data. (multipe sampling fractions are not yet supported.)
`sampleSize`	total sample size (applies only when `sampleFraction` is missing).
`scaled`	logical: indicates if original (default) or scaled data returned.
`includeId`	logical indicates if sample should include the key uniquely identifying each data row.
`test`	logical: if TRUE show what would be done, only (similar to parameter `test` in RODBC functions: sqlQuery and sqlSave).

Value

computeClusterSample returns an object of class "toakmeans" (compatible with class "kmeans").

Examples

if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
                         
km = computeKmeans(conn, "batting", centers=5, iterMax = 25,
                   aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
                   id="playerid || '-' || stint || '-' || teamid || '-' || yearid", 
                   include=c('g','r','h'), scaledTableName='kmeans_test_scaled', 
                   centroidTableName='kmeans_test_centroids',
                   where="yearid > 2000")
km = computeClusterSample(conn, km, 0.01)
km
createClusterPairsPlot(km, title="Batters Clustered by G, H, R", ticks=FALSE)
}

Random sample of clustered data

Description

Usage

Arguments

Value

See Also

Examples

Results