Generate a spectrogram for the signal. This chops the signal into
overlapping slices, windows each slice and applies a Fourier
transform to determine the frequency components at that slice.
Usage
specgram(x, n = min(256, length(x)), Fs = 2, window = hanning(n),
overlap = ceiling(length(window)/2))
## S3 method for class 'specgram'
plot(x, col = gray(0:512 / 512), xlab="time", ylab="frequency", ...)
## S3 method for class 'specgram'
print(x, col = gray(0:512 / 512), xlab="time", ylab="frequency", ...)
Arguments
x
the vector of samples.
n
the size of the Fourier transform window.
Fs
the sample rate, Hz.
window
shape of the fourier transform window, defaults to
hanning(n). The window length for a hanning window can be
specified instead.
overlap
overlap with previous window, defaults to half the
window length.
col
color scale used for the underlying image function.
xlab,ylab
axis labels with sensible defaults.
...
additional arguments passed to the underlying plot functions.
Details
When results of specgram are printed, a spectrogram will be plotted.
As with
lattice plots, automatic printing does not work inside loops and
function calls, so explicit calls to print or plot are
needed there.
The choice of window defines the time-frequency resolution. In
speech for example, a wide window shows more harmonic detail while a
narrow window averages over the harmonic detail and shows more
formant structure. The shape of the window is not so critical so long
as it goes gradually to zero on the ends.
Step size (which is window length minus overlap) controls the
horizontal scale of the spectrogram. Decrease it to stretch, or
increase it to compress. Increasing step size will reduce time
resolution, but decreasing it will not improve it much beyond the
limits imposed by the window size (you do gain a little bit,
depending on the shape of your window, as the peak of the window
slides over peaks in the signal energy). The range 1-5 msec is good
for speech.
FFT length controls the vertical scale. Selecting an FFT length
greater than the window length does not add any information to the
spectrum, but it is a good way to interpolate between frequency
points which can make for prettier spectrograms.
After you have generated the spectral slices, there are a number of
decisions for displaying them. First the phase information is
discarded and the energy normalized:
S = abs(S); S = S/max(S)
Then the dynamic range of the signal is chosen. Since information in
speech is well above the noise floor, it makes sense to eliminate any
dynamic range at the bottom end. This is done by taking the max of
the magnitude and some minimum energy such as minE=-40dB. Similarly,
there is not much information in the very top of the range, so
clipping to a maximum energy such as maxE=-3dB makes sense:
S = max(S, 10^(minE/10)); S = min(S, 10^(maxE/10))
The frequency range of the FFT is from 0 to the Nyquist frequency of
one half the sampling rate. If the signal of interest is band
limited, you do not need to display the entire frequency range. In
speech for example, most of the signal is below 4 kHz, so there is no
reason to display up to the Nyquist frequency of 10 kHz for a 20 kHz
sampling rate. In this case you will want to keep only the first 40%
of the rows of the returned S and f. More generally, to display the
frequency range [minF, maxF], you could use the following row index:
idx = (f >= minF & f <= maxF)
Then there is the choice of colormap. A brightness varying colormap
such as copper or bone gives good shape to the ridges and valleys. A
hue varying colormap such as jet or hsv gives an indication of the
steepness of the slopes. The final spectrogram is displayed in log
energy scale and by convention has low frequencies on the bottom of
the image.
Value
For specgram list of class specgram with items:
S
complex output of the FFT, one row per slice.
f
the frequency indices corresponding to the rows of S.
t
the time indices corresponding to the columns of S.
Author(s)
Original Octave version by Paul Kienzle
pkienzle@users.sf.net. Conversion to R by Tom Short.