Compute histogram of the table column in Aster by mapping its value to
bins based on parameters specified. When column is of numeric or
temporal data type it uses map-reduce histogram function over continuous
values. When column is categorical (character data types) it defers to
computeBarchart that uses SQL aggregate COUNT(*) with
GROUP BY <column>. Result is a data frame to visualize as bar charts
(see creating visualizations with createHistogram).
pre-built summary of data to use (require when test=TRUE). See getTableSummary.
columnFrequency
logical indicates to build histogram of frequencies of column
binMethod
one of several methods to determine number and size of bins: 'manual' indicates to use
paramters below, both 'Sturges' or 'Scott' will use corresponding methods of computing number
of bins and width (see http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width).
binsize
size (width) of discrete intervals defining histogram (all bins are equal)
startvalue
lower end (bound) of values to include in histogram
endvalue
upper end (bound) of values to include in histogram
numbins
number of bins to use in histogram
useIQR
logical indicates use of IQR interval to compute cutoff lower and upper bounds for values to be included in
histogram: [Q1 - 1.5 * IQR, Q3 + 1.5 * IQR], IQR = Q3 - Q1
datepart
field to extract from timestamp/date/time column to build histogram on
where
specifies criteria to satisfy by the table rows before applying
computation. The creteria are expressed in the form of SQL predicates (inside
WHERE clause).
by
for optional grouping by one or more values for faceting or alike
test
logical: if TRUE show what would be done, only (similar to parameter test in RODBC
functions like sqlQuery and sqlSave).
oldStyle
logical indicates if old style histogram paramters are in use (before Aster AF 5.11)
See Also
computeBarchart and createHistogram
Examples
if(interactive()){
# initialize connection to Lahman baseball database in Aster
conn = odbcDriverConnect(connection="driver={Aster ODBC Driver};
server=<dbhost>;port=2406;database=<dbname>;uid=<user>;pwd=<pw>")
# Histogram of team ERA distribution: Rangers vs. Yankees in 2000s
h2000s = computeHistogram(channel=conn, tableName='pitching_enh', columnName='era',
binsize=0.2, startvalue=0, endvalue=10, by='teamid',
where="yearID between 2000 and 2012 and teamid in ('NYA','TEX')")
createHistogram(h2000s, fill='teamid', facet='teamid',
title='TEX vs. NYY 2000-2012', xlab='ERA', ylab='count',
legendPosition='none')
}