Last data update: 2014.03.03

R: Random sample of clustered data
computeClusterSampleR Documentation

Random sample of clustered data

Description

Random sample of clustered data

Usage

computeClusterSample(channel, km, sampleFraction, sampleSize, scaled = FALSE,
  includeId = FALSE, test = FALSE)

Arguments

channel

connection object as returned by odbcConnect.

km

an object of class "toakmeans" obtained with computeKmeans.

sampleFraction

one or more sample fractions to use in the sampling of data. (multipe sampling fractions are not yet supported.)

sampleSize

total sample size (applies only when sampleFraction is missing).

scaled

logical: indicates if original (default) or scaled data returned.

includeId

logical indicates if sample should include the key uniquely identifying each data row.

test

logical: if TRUE show what would be done, only (similar to parameter test in RODBC functions: sqlQuery and sqlSave).

Value

computeClusterSample returns an object of class "toakmeans" (compatible with class "kmeans").

See Also

computeKmeans

Examples

if(interactive()){
# initialize connection to Lahman baseball database in Aster 
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
                         server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
                         
km = computeKmeans(conn, "batting", centers=5, iterMax = 25,
                   aggregates = c("COUNT(*) cnt", "AVG(g) avg_g", "AVG(r) avg_r", "AVG(h) avg_h"),
                   id="playerid || '-' || stint || '-' || teamid || '-' || yearid", 
                   include=c('g','r','h'), scaledTableName='kmeans_test_scaled', 
                   centroidTableName='kmeans_test_centroids',
                   where="yearid > 2000")
km = computeClusterSample(conn, km, 0.01)
km
createClusterPairsPlot(km, title="Batters Clustered by G, H, R", ticks=FALSE)
}

Results