Beat tracking and reaction time

Nick Collins & Ian Cross

Cambridge University, UK

A drawback of current computational beat induction models is their slow reaction time at period and phase transitions when compared to human listeners. This is a particularly critical problem for applications in real-time interactive computer music performance systems. Models will typically utilize some form of correlation search through an energy signal exhaustively testing period and phase hypotheses over a 3-6 second temporal window (Davies 2005, Laroche 2003, Scheirer 1998). Adaptation to changes is slow, for realignment may be delayed on the order of the length of that window.

It does not help that evaluation procedures for engineering work on beat tracking follow such measures as the ‘longest continuously tracked segment’ but do not adequately test behavior at transitions. An ability to recover after transitions may necessarily involve a temporary loss of the pulse inconsistent with continuous tracking scores. The issue of behavior at transitions has been inadequately tackled in the beat tracking literature.

A preliminary experiment was undertaken to determine how quickly subjects could recover from abrupt shifts of stimulus phase and period in a tapping task on real polyphonic audio tracks. These stimuli were 4/4 popular music of the kind culturally familiar to the subjects, and those typically tested in computer beat induction research. Results show reaction times comparable with rapid period correction achieved in tapping tasks to impoverished isochronous stimuli (Repp 2001), and the sub-second time for genre recognition shown experimentally in (Perrot and Gjerdingen 1999) and posited to underlie transcription tasks in (Hainsworth 2004).

Thus, intelligent listening strategies may be dependent on genre and instrument recognition cues. Humans can parse events within the context of prior knowledge of musical style and the associated conventions linking timbral, pitch and rhythmic structure. The role of dynamic, durational and melodic accents (Parncutt 1994, McKinney and Moelants 2004) may be guided or supplemented in polyphonic audio tracking by information on the relative importance of instrumental parts within a given genre.

The implications of this work for beat induction are to favour further work in signal analysis particular to genre, along the lines of Masataka Goto’s drum pattern and chord change detection (Goto 2001). A comparison of implementations of beat tracking models is described that suggests the utility of signal event analysis information in accurate and fast synchronization (without pi-phase errors), as exhibited by the Goto model.