Automated and Robust Measurement of Signal Features
by Dr. Kathryn A. Cortopassi
Measurement of signal characteristics (or features) is a crucial step in many bioacoustic studies. Such feature sets can, for example, form the basis of analyses relating signal attributes to various biological measures and contexts, or of recognition-discrimination tasks providing for automatic extraction of relevant signals from large data sets. However, traditional approaches to measurement of time-frequency signal characters, such as duration, bandwidth, etc., have a number of problems.
Often, researchers make signal measurements manually, using interactive tools for display and measurement of time-frequency spectrograms. Where no other means of feature extraction is available, the manual approach can provide valuable and important measurement data. However, measurements made in this way can be plagued by unintended subjectivity. Perhaps most notorious is the assessment of signal onset and offset, in both time and frequency, for estimation of signal duration and bandwidth. Signal boundaries are rarely sharp, and the assessment of begin and end points can be strongly biased by researcher experience and expectation. In addition, handmade measurements have an inherent lack of repeatability, and they are strongly affected by spectrogram display settings often native to the particular sound analysis platform.
Even when measurements are made using objective criteria, many traditional measurement protocols rely on good estimation of signal extremes, and so can be exceptionally sensitive to noise or other outlying clutter. A traditional measure of signal bandwidth, for example, is the frequency range 3 dB down from the peak of the signal power spectrum, or the half-power point. Spurious spectral peaks, or other outliers, due to recording system noise or environmental background noise and clutter can drive the outcome of the measurement. When background noise conditions are unfavorable, this approach can result in easily perturbed and unreliable bandwidth estimates; the same signal measured in different noise environments could return widely different bandwidth estimates. This measurment, and in fact any measure relying on peak or extreme values to characterize signal structure, is inherently volatile and susceptible to destabilization in the presence of noise.
While ambient noise cancellation methods can be employed prior to measurement, they are never perfect, and traditional measurements based on signal extrema can suffer as a result. Our solution is to take a statistical approach to the measurement of signal features. This approach is resistant to perturbation by spurious noise events, and results in more robust and reliable feature estimates.
We begin by considering the signal as a distribution of energy in time and in frequency. We think of this signal energy distribution as analogous to a probability density function in statistics, with the variate being either time or frequency, and the density being the fraction of the total signal energy at that value of the variate. From this distribution function, we can make various measures of central tendency and of dispersion around that central location in order to characterize the signal structure in time and frequency. Familiar estimators of central tendency and dispersion include the various statistical moments, mean, variance, and skewness.
While moments would provide an objective and repeatable feature set with which to characterize signal structure, these familiar statistical estimators break down immediately in the face of outliers. Instead, we use more robust estimators of central tendency and dispersion based on order statistics, the median, interquartile range, and quartile skewness (or Bowley skewness). These are just one set of possible estimators, and the approach, in fact, invites the possibility of using any other of a number of robust statistical estimators for characterization and summary of the signal time and/or frequency energy distributions. This follows a philosophy of robust measurement for animal vocalizations established by Fristrup and Watkins (1993) in their work on the automatic measurement and classification of cetacean vocal signals. These order-statistic-based estimators do not break down as rapidly in the face of spurious outliers as do the familiar statistical moments (Mosteller & Tukey 1977). Thus, even when presented with signals from varying background noise and clutter conditions, and consequently of varying quality, these measures should provide robust and reliable estimates of characteristic signal features.
The measurement process begins with a time-frequency decomposition of the signal of interest using short-time Fourier transforms (STFT), or generation of a time-frequency spectrogram (Fig 1a). In general, any other time-frequency representation of the signal could be used. From this spectrogram representation, we generate both an aggregate power envelope and an aggregate power spectrum representation of the signal. Before the aggregates are calculated, the spectrogram may be processed using any of a number of straightforward and computationally inexpensive noise removal, or denoising, algorithms. Being able to easily apply denoising algorithms prior to measurement is a distinct advantage of starting from a spectrogram representation of the signal.
The aggregate power envelope is generated by summing all the power values in each of the short-time spectra composing the spectrogram, resulting in a power versus time envelope (Fig 1b). Similarly, the aggregate power spectrum is generated by summing all the power values in each of the frequency bands, ornarrow-band envelopes, composing the spectrogram, resulting in a power versus frequency spectrum (Fig 1c). Once the aggregates are generated, they are normalized to have unit area, and are treated like probability density functions with time and frequency as variates. Various measures of their central tendency and dispersion are calculated, based on robust order statistics. Whether a function of time or of frequency, the distribution is assessed in the same way; this results in a set of parallel measures for the signal energy distribution in both time and frequency.
Figure 1: Spectrogram representation (a) of a signal of interest. Generation of the aggregate time envelope (b) and frequency spectrum (c) is accomplished by summing power values in each short- time spectrum or narrow-band envelope respectively. The resulting aggregates (once normalized to have unit area) are treated like probability density functions with time and frequency as variates.
The principal measures extracted from the distributions include: the median (M), initial percentile (P1) value, terminal percentile (P2) value, interpercentile range (IPR), and percentile skewness (PS). In more familiar terms, these measures provide estimates of signal center-time or -frequency, duration or bandwidth, and envelope or spectrum symmetry. Notice that instead of using first and third quartile values, and interquartile range as measures of dispersion, we have chosen to take a more general approach. Here, we specify a particular fraction (P) of the total signal energy to be captured by the initial and terminal percentile values. Thus, an energy fraction of 0.5 would result in the familiar quartiles and interquartile range. An energy fraction of 0.8 would return the 10th and 90th percentile values, and the width of the range in between. Analogously, the quartile skewness becomes the more general percentile skewness, in which the symmetry of the initial-percentile to median distance and the median to terminal-percentile distance is assessed. Figure 2b and c show the median, initial percentile, terminal percentile, and interpercentile range values for the aggregate power envelope and power spectrum using a P = 0.75. Thus, in this case, 75% of the total signal energy is captured in the interpercentile range.
Figure 2: Spectrogram representation (a) of our signal of interest with an overlay of the median frequency contour, which is based on the medians of the successive short-time spectra. The median (M), initial percentile (P1), terminal percentile (P2), and interpercentile range (IPR) values are shown for the aggregate time envelope (b) and frequency spectrum (c) for an energy fraction P = 0.75.
A variety of other measures can be generated from the STFT signal representation following this basic philosophy of using robust order statistics for characterization of signal energy distributions. In addition, the approach naturally invites the generation of parallel measures in time and frequency. Of particular value is the extraction of frequency versus time and time versus frequency contours. These contours can be generated by measuring each of the short-time spectra and narrow-band envelopes composing the spectrogram as described above for the aggregates. The result is a set of contours describing the variation of any number of frequency or time measures (e.g., median, interpercentile range, and percentile skewness) with the corresponding time or frequency variate. Some familiar examples include center-frequency and bandwidth versus time contours. Paralleling these are the analogous center-time and duration versus frequency contours. Thus, the approach leads to a set of parallel time and frequency contour measures, similar to the percentile measures above. These contours can themselves be summarized using a number of derivative and order-statistic based measures. Figure 2a shows the median (center) frequency contour overlaid on the power spectrogram.
Finally, a number of measures can be obtained by considering the sorted aggregate energy distributions (Fig 3a,c) in addition to the straight distributions (Fristrup & Watkins 1992, 1993). These measures, when combined with the straight percentile measures, explore the compactness of the signal's energy distribution. The measures are designated as concentration (CTR), lower (L) value, upper (U) value, lower-upper range (LUR), and lower-upper skewness (LS). Concentration is measured as the number of bins (converted to the appropriate time or frequency units) needed to accumulate a fraction P of the total signal energy in the sorted energy distribution. This is a measure of the compacted duration or bandwidth for the signal. The value of concentration in relation to interpercentile range reveals how densely or loosely the signal's energy is distributed. The lower and upper values are the lowest and highest values respectively of time and/or frequency encountered in calculating the concentration. Whereas concentration gives the most compact measure of duration or bandwidth for the signal, the lower-upper range gives the most expansive. Figure 3 (b,d) shows the concentration, interpercentile range, and lower-upper range overlaid on the straight aggregate distributions for a P = 0.75. Lower-upper skewness measures the asymmetry of the lower-upper range in relation to the median of the straight distribution.
Figure 3: The sorted aggregate time envelope (a) and frequency spectrum (c) for our signal of interest. Notice how the time and frequency indices are not sequential. The segment needed to accumulate a fraction P = 0.75 of the total signal energy is marked, and denoted as concentration (CTR). The concentration, interpercentile range (IPR), and lower-upper range (LUR) are shown together overlaid on the aggregate time envelope (b) and frequency spectrum (d).
As with the percentile measures (median, interquartile range, and percentile skewness), the concentration, lower-upper range, and lower-upper skewness can be measured for each short-time spectrum and narrow-band envelope composing the spectrogram, generating the corresponding time or frequency contours. As mentioned above, these contours can themselves be summarized using a number of derivative and order-statistic based measures. Thus, adding new distributional measures and following out the parallelism in time and frequency, both for aggregate and for short-time or narrowband distributions, can rapidly result in a large, comprehensive, and robust feature set to characterize signals of interest.
Fristrup, K. M. & Watkins, W. A. (1992) Characterizing acoustic features of marine animal sounds. Woods Hole Oceanographic Institution Technical Report WHOI-92-04.
Fristrup, K. M. & Watkins, W. A. (1993) Marine animal sound classification. Woods Hole Oceanographic Institution Technical Report WHOI-94-13.
Mosteller F. & Tukey J. W. (1977) Data analysis and regression, a second course in statistics. Addison-Wesley Publishing Company, Reading Massachusetts.
Link to specific implementations:
Energy Distribution Measurement
(Release 17 November 2004, for XBAT)