There are many different types of transitions between notes, even when considering only one instrument. Dynamics are of concern when characterizing transitions, e.g., the first note could be quiet, the second loud, and the pitch interval between the two notes is relevant. In the case of a violin, one must also consider bow pressure, bow speed, bow location, and pitch.
Also, research [4] has shown that when a subject is identifying musical instruments, given only short examples of the timbre (e.g., just the attack, sustain, or release), the attack had the greatest influence on the subject's decision. This implies that transitions between steady states may provide important cues that should be synthesized accurately.
Figure 2.1: A track represents the time-varying frequency and amplitude of a single-frequency component in the analyzed sound [11].
Figure 2.2 illustrates how a loud tone can mask nearby ones with lower amplitudes. For example, if one listens to a 415 Hz sinusoid at 0 dB, summed with an 800 Hz sinusoid at -30 dB, a listener may not perceive the higher-pitched frequency. However, if the two sinusoids were kept at the same amplitude but the 800 Hz sinusoid moved to 4000 Hz, a listener would perceive both sinusoids. This is a psychoacoustic effect called masking.
In the original MQ technique, one global threshold parameter (or noise floor) is used across the entire spectrum in each frame of the analysis. Any peaks found below this threshold are assumed to be noise and not included in the
Figure 2.2 [Currently not available]: Masking levels corresponding to a 415Hz sinusoid [13].
construction of the current frame. There also exists a local threshold that is applied individually to base-two logarithmic sections of the total spectrum. Peaks found below the highest peak's amplitude (in that frequency bin) minus the local threshold are thrown out. From the previous example, let us assume that the local threshold for each bin is 20 dB. The 800 Hz sinusoid at -30 dB would be thrown out due to the amplitude of the 415 Hz peak, assuming that both peaks were within the same frequency bin. However, if the loudest peak in the bin containing 4000 Hz was at - 15 dB, the 4000 Hz, - 30 dB sinusoid would be kept. This extension roughly simulates masking properties occurring in the ear.
Fitz and Walker also extended the MQ algorithm with hysteresis. An effect affectionately dubbed the "doodley-doo" effect by James Beauchamp was discovered when certain thresholds were crossed multiple times by a sinusoid with a time-varying amplitude (or amplitude modulation). This is most evident in harmonic sounds with amplitude vibrato. At a high point in the sinusoid's amplitude, the signal would cross over the threshold and be included in that frame's list of peaks. At a low point, the sinusoid's amplitude drops below the threshold, and the peak is thrown out. Figure 2.3 shows a time-frequency plot of this artifact. After synthesis, the rapid turning-on and -off of the sinusoid is quite audible and undesirable.
Figure 2.3: Example of the "doodley-doo" effect. Notice the disappearance and reappearance of the partial at ~ 4400 Hz and ~5000 Hz.
To solve this problem, the analysis hysteresis is a third threshold setting available in Lemur. Hysteresis allows tracks to drop a specified amount below the local frequency-bin threshold for any length of time. If the track rises above the threshold, it continues as before. If the track falls below the allowed local threshold minus the hysteresis, it dies. For example, if a sinusoid with -20 dB amplitude and +/-3 dB of amplitude vibrato is analyzed with a -22 dB threshold and 5 dB of hysteresis, the track would not die even though it traveled below the local threshold. If, however, the track's overall amplitude dropped to -24 dB, the track would be continuously reborn and forced to die on each cycle of the vibrato. Knowing this, one must choose the hysteresis parameter carefully, possibly to account for all harmonic instances of amplitude vibrato.
Not all analysis modifications correspond to actual differences in playing the instrument. In a frequency scaling or frequency shifting scenario, one must consider the instrument's characteristic frequency response. If one were to multiply the frequency of each harmonic by two, the amplitudes of each harmonic would stay the same. This is misleading since the actual response of the instrument has not been shifted higher in frequency by a factor of two. In other words, scaling all harmonics by some factor does not necessarily imply changing the pitch by the same amount. On a real instrument, playing one note, and then another, one octave higher, does not shift the frequency response of the instrument. In sum, playing two notes an octave apart corresponds to a change in timbre, as well as a change in pitch. This does not mean that we cannot perform slight modifications, but it does mean that we must consider such factors before performing certain modifications in the time-frequency domain.[1]
Figure 3.1 [Currently not available]: A typical time-frequency plot generated by a harmonic analysis technique [17].
Figure 3.2: A typical LemurEdit graph. Pertinent analysis parameters are visible at the bottom of the graph; peak data at the top of the graph are updated dynamically given the current cursor location.
Altering the slope of the ramp function can either stretch or shrink the duration of the sound being synthesized without invariably introducing the "chipmunk" effect into the sound. This occurs because we are shrinking (or stretching) the duration of each harmonic in the frequency domain, instead of shrinking (or stretching) the entire time-domain signal. A negative slope will cause the sound being synthesized to play backwards. For an interesting effect, a sine wave will cause the sound to move backwards and forwards at varying speeds.
When we synthesize, first an analysis of a sound must be made using Lemur. Then the file is saved in a special format called an SOS analysis file. This contains all the information for each oscillator, such as when to turn on and off, what the amplitude is at each frame, and what the frequency is at each frame. Given the limited number of oscillators, more psychoacoustic processing is done to reduce the amount of data present in the analysis. The SOS Program, SOS Analysis File, and time function are loaded into the Capybara. Finally, a sophisticated protocol is used to tell the Capybara when to start playing the sound from the Macintosh. The protocol also allows the user to access memory directly within the Capybara. This is useful for changing the amplitude and frequency envelopes while the SOS analysis file is still in memory.
Once an analysis and its corresponding SOS equivalent have been loaded into the Macintosh and the Capybara, respectively, we can choose the portions of the analysis we would like to hear. This is done by sending a series of messages to the Capybara from the Macintosh that invert the amplitude envelopes of the tracks we wish to mute. Whenever the SOS program comes across a peak in a track with a negative amplitude, it turns off that track's oscillator. This feature allows the user to listen to individual harmonics, perhaps a combination of specified harmonics, or even tracks of a given length. In our transition research, we found it invaluable for listening to what we refer to as "lingering tracks": harmonics that do not die instantly once the transition region has been reached, but continue onwards for as long as 300 ms in some cases.
Transition Type #1: We found this type of transition to be the most characteristic of harmonic behavior within the transition region. The death
of the first track occurs two to three frames later in time than the birth of the second. Two oscillators are required to synthesize this partial.
Figure 4.1: Three types of harmonic behavior across the transition region.
Transition Type #2: This is a Lemur artifact, and presumed not to be characteristic of transitions. The two tracks spanning the transition region lie within the allowed frequency drift of the specified analysis parameters causing Lemur to connect the two tracks. At synthesis time, only one oscillator will be necessary to synthesize this portion. This transition type typically occurs when harmonics between the two notes coincidentally line up. Numerous examples of this can be seen in the analyses shown in Appendix C. For example, most of the half step transitions resulted in the connection of the fundamentals of both notes. This artifact can also occur for higher harmonics. In Figure C.6, we see the 9th harmonic of the first note connected with the 8th harmonic of the second. This poses a small problem when creating the library, which will be dealt with in Section 5.2.
Transition Type #3: This is a special case of type #1. Here the death of the first track and the birth of the second occur within the same frame, and again, two oscillators are necessary to synthesize this partial. It was speculated in [9] that such a discontinuity in frequency would result in "chirps." Further research showed that this is not the case since the amplitudes ramp up and ramp down; they do not cut off abruptly.
Figure 4.2: Close-up of first five harmonics of Figure C.6. Notice the lingering tracks of the second and fifth harmonics of the first note.
that lingering tracks are typical of transitions. We can look at the higher harmonics of half step transitions and notice that as the corresponding harmonics drift further apart, lingering tracks start to reappear. (See harmonic #6 of Figure C.16.)
Likewise, the attack shows the first seven harmonics beginning together, while higher harmonics gradually fade into existence. At the transition attack, however, all the harmonics are present immediately at the beginning, which further supports our need to employ special transition processing.
Clearly, the larger the library, the better the synthesis. Using the transition in Figure C.10 as an example, we have an analysis 188 kbytes in length, containing 816 frames. If we use all lingering tracks within the analysis, we must store approximately 80 frames of data, or ~20 kbytes of data per transition. For today's computer systems, it is not inconceivable to be able to store hundreds of transitions for a given instrument. With this in mind, it makes sense to include both transitions shown in Figures C.10 and C.11 in our library.
Figure 5.1: First note of the transition from Figure C.9. Notice the need to time-cut harmonics #9 and other higher-frequency ones. Harmonics #4 and #5 are lingering tracks (they belong to the transition release of the first note), and should not be time-cut.
Figure 5.2: Second note of Transition #A9.
is created. See Figure 5.1 and Figure 5.2 for an example of the separation of the transition. Occasionally, there are cases in which a harmonic from the first note erroneously connects with a harmonic from the second note, e.g., Transition Type #1 in Section 4.1. In this case, the offending track is included in both the first- and second-note analyses. A time-cut operation is used to remove the unwanted portion. The final step is to remove the steady-state portions of the first and second notes. This requires some careful consideration as to when the transition region begins and ends.
Another consideration is what to do with noise tracks. These are tracks that can be characterized by short lengths at typically low amplitudes. If one examines any of the transition pictures found in Appendix C, one will see that they can be found at the beginning of sounds (i.e., when the first note is being played), and at frequencies under ~200 Hz. Noise tracks found within the transition region are kept along with the transition components as they are extracted from the analyses. After the transition region has been chosen, any noise track whose birth is found to the left of the transition region median is assigned to the first note, while any noise track found to the right of the transition region median is assigned to the second. Noise tracks found below the fundamental are omitted altogether. They are considered characteristic of the instrument as a whole and should be synthesized separately. They are not to be included in the transition processing step.
threshold, transition processing occurs, and we ask the library for a transition release for the first note and a transition attack for the second. The library will either return a "hit" or a "miss," given the parameters specified by the performer.
A library "hit" implies that a transition meeting the performer's requests is available for immediate synthesis. In this case, the transition release is recalled and synthesized in real time using the additive synthesis SOS program, running on the Capybara Signal Processor. If the library reports a "miss," the algorithm must find the closest transition in the library and synthesize it using interpolation to match the transition with what was requested by the performer. If the pitch is not close enough, we use Lemur to perform frequency scaling. If the amplitude is not right, simple scaling of each peak's amplitude occurs. These two parameters (in the case of our violin transitions) are the easiest values to interpolate, since they require little processing. Other parameters, such as bow change vs. no-bow change are more difficult, so it makes sense to include more varieties of transition attacks and releases with respect to that parameter in the library.
This appendix contains the distribution manual that accompanies LemurEdit. LemurEdit can be obtained via anonymous ftp from "unicorn.cerl.uiuc.edu" in the "/pub/lemur" directory.
A Graphical Editing Tool for Lemur(TM) Analyses
Written by Bryan Holloway (Homey@uiuc.edu)
CERL Sound Group
University of Illinois
Requirements:
LemurEdit requires a Mac II or greater with a FPU. The amount of memory needed is directly proportional to the size of the analysis files you want to load. 4MB is recommended for a one- to two-second analysis with lots of peak data. LemurEdit is known to run on System 6.0.5 or higher.
High gray-scale resolutions should be chosen in the Monitors section of the Control Panel since amplitude is represented by varying shades of gray.
Viewing a Lemur analysis:
LemurEdit allows the user to examine closely a Lemur analysis of a sound. Launching the application will show a quick splash page. Click the mouse or hit any key. You will then be prompted for a Lemur analysis file-name. After loading the file, you can zoom in on different regions of the graph. Use a click-and-drag operation to select a region of interest. Command-Z or selecting "Zoom" from the Actions menu will display a close-up of the selected region.
To reload a full-size version of the analysis, choose "Full Zoom Out" in the Edit Menu.
Multiple Lemur analyses can be open at the same time.
Amplitude scaling can be changed from a logarithmic scale to a linear scale from the Scaling menu. LemurEdit defaults to a logarithmic scale since it tends to show more amplitude detail.
Editing a Lemur analysis:
There are a number of operations that can be performed on a Lemur analysis. At any time you can save the file ("Save" in the File menu) to its original file-name, or use "Save As..." (in the File menu) to specify a new Lemur Analysis file. Saving is highly recommended since at this time there is no Undo option.
Saving selected tracks:
A check-box has been added to the standard "Save As..." dialog to allow the saving of ONLY selected tracks from an analysis. This feature is extremely useful in picking out portions of an analysis and saving them into different files. There are two ways to select tracks:
a) Manual track selection -- Simply clicking the mouse button will select the closest track in the vertical direction. If no track passes through the frame where the cursor is, no track will be selected.
b) Region track selection -- Select a region as if you were going to zoom. Press Command-S or choose "Select Tracks" from the Actions menu to select those tracks within the region. Press Command-D or choose "De-select Tracks" from the Actions menu to de-select those tracks. Press Command-T or choose "Toggle tracks" to turn off tracks that were previously on, and turn on tracks that were previously off. NOTE: As of this version, only tracks whose BIRTH lies within the region will be selected (or de-selected.) In other words, if a region does not enclose the birth of a track, it will NOT be selected, even if part of the track does lie within the region.
Time-Cutting:
Chunks of time can be removed from a Lemur analysis by selecting an appropriate region in time and hitting Command-X or selecting "Time-cut" from the Edit menu. LemurEdit will attempt to join both ends on either side of the cut as best as it can using the same parameters that the original Lemur analysis used, i.e. frequency drift.
Track Connection:
To join two tracks, select two tracks and press Command-C or choose "Connect Tracks" from the Edit Menu. Some restrictions are imposed upon track connections. First, you may not connect tracks that overlap in time. Second, if you try to connect tracks that could not be connected in Lemur due to the frequency-drift parameter of the analysis, LemurEdit will warn you, but still allow you to join them. Note that connecting two tracks does not change the analysis file on disk.
Labeling Peaks:
A 16-bit label field has been added to the peak structure. One can label these peaks and then use that information when doing one's own processing of Lemur analysis files. Peak labels do not have to be the same for a given track. To label peaks, select tracks OR a region and press Command-L or choose "Label Peaks" in the Actions Menu. This feature was added particularly for research purposes and may not otherwise be useful.
Playing a Lemur analysis in real time using the Capybara(TM):
This is part of work in progress and should only be used after consultation with the CERL Sound Group. If you have a Capybara, you can use a Sum-Of-Sines (SOS) Kyma sound to play back Lemur Analysis files in real time. Start Kyma and play the Setup sound that corresponds to the SOS-version of the analysis you are going to synthesize. Then launch LemurEdit, and load the Lemur Analysis version of the analysis. From the Options menu choose:
"Play" (Command-P) to play the selected tracks.
The Capybara will play the selected tracks and then loop back after reaching the end of the analysis. Press Command-K to stop playing.
Known bugs:
Occasionally when playing analysis files through the Capybara, the SOS program resident on the Capybara will hang. This can be remedied by replaying the Setup sound in Kyma.
This appendix illustrates the advanced data types used in LemurEdit. Table B.1 is a description of the WindowSet structure that represents an analysis file and its graphs in memory. Table B.2 is a description of the LemurWindow structure that represents a particular region of an analysis displayed on the Macintosh screen. Table B.3 shows the header of a Lemur analysis file, and Figure B.1 is a block diagram illustrating the way in which frames and peaks are stored on disk.
Table B.1: WindowSet Structure.
Variable Comment firstLemurWin; Pointer to linked-list of corresponding LemurWindows. parameters; Analysis Parameters for this WindowSet. firstFrame; Pointer to first frame in Lemur Analysis-file. listOfTracks; Pointer to a linked-list of tracks selected by the user. trackTable; Pointer to a track-table structure. This includes parameters used by the SOS protocol. maximumAmplitude; The loudest peak's amplitude within the entire analysis. This value is used for plotting shades of gray to represent amplitude. numberOfFrames; Number of frames in this Analysis. numberOfPeaks; Number of peaks in this Analysis. numberOfSelTrks; Number of user-selected tracks. lemurFileSpec; Macintosh System 7 file data structure for this Analysis-file. lemurFileRefNum; Macintosh File marker for accessing Analysis-file. numGraphs; The number of LemurWindows currently open with this WindowSet defGWidth; Default LemurWindow width (in pixels). defGHeight; Default LemurWindow height (in pixels). baseAddr; Pointer to base of Macintosh memory allocated for a WindowSet's frame and peak structures. currentAddr; Pointer to current frames and peaks memory allocation address. TTbaseAddr; Pointer to base of Macintosh memory allocated for the WindowSet's Track-Table. TTcurrAddr; Pointer to current Track-Table memory allocation address. protocolHandle; Handle to external value that allows LemurEdit to talk to the Capybara using the SOS protocol.
Table B.1: WindowSet Structure (continued).
envRateHandle; Handle to external value that controls the SOS sound's sample-increment. triggerHandle; Handle to external value that controls the SOS sound's trigger. whatAmIDoing; Current state of this WindowSet (Idling, Graphing, etc.) regionRect; Macintosh QuickDraw coordinates of region selected by user.
Table B.2: LemurWindow Structure.
Variable Comment myWindow; Pointer to the Macintosh Window structure. offScreenPix; Pointer to an off-screen bitmap of the current graph. gWidth; Width in pixels of this LemurWindow. gHeight; Height in pixels of this LemurWindow. pixelToFrameRatio; Number of pixels contained within one frame (fractional). mySubset; Structure describing the bounds on this LemurWindow in frames and frequency. newSubset; Structure describing the bounds on a close-up in frames and frequency. parent; Pointer to this LemurWindow's WindowSet. nextMQWin; Pointer to the next LemurWindow in this WindowSet.
Table B.3: Header, Frame and Peak Data of a Lemur Analysis File.
Variable Comment (units) analysisThreshold; This parameter specifies the noise floor of the input signal. Any peaks found below it are assumed to be noise (dB). analysisRange; This parameters specifies the local threshold for picking peaks. Any peaks found below the magnitude of the loudest peak minus the local threshold are thrown out (dB).
Table B.3: Header, Frame and Peak Data of a Lemur Analysis File (continued).
hysteresis; This parameter specifies the amount of hysteresis allowed for tracks. It is the difference between the amplitude thresholds for the birth of a track and the continuation of a track (dB). mainLobeWidth; This is a Kaiser window parameter for specifying side-lobe width of the window (Hz). sidelobeAttenuation; This is also a Kaiser window parameter for specifying side-lobe height (dB). analysisFrameLength; This parameter specifies the amount of time between consecutive frames (s). originalNumSamples; The total number of samples in the analysis. analysisSampleRate; The sample-rate used during the analysis (Hz). frequencyDrift; This analysis parameter specifies the relative amount of track frequency drift allowed over the duration of a frame (Hz percentage).
Figure B.1: Lemur File Structure [16].
This appendix describes the two sets of 16 digital violin recordings performed by Ms. Katherine Kim. Table C.1 describes the pitch interval, direction, and comments of each transition. Only one set is tabulated here, since the two sets were identical. Figures C.1 - C.32 are LemurEdit graphs of both sets of the corresponding Lemur analyses.
When analyzing the thirty-two transitions, the following analysis parameters were used: a -70 dB Noise floor, a 15 dB local threshold, 15 dB of hysteresis, and a 2% allowed frequency drift.
Table C.1: Two Sets of 16 Violin Transitions
No. Interval Direction Comment 1 Half-Step Ascending B to C on the A string (440 Hz), slurred, notes are equal length 2 Half-Step Ascending B to C, slurred, first note longer than second 3 Half-Step Ascending B to C, slurred, first note very long, 2nd note short 4 Half-Step Ascending B to C, slurred, gritty attack, first note shorter than second 5 Half-Step Descending B to C, slurred, gritty attack, both notes equal in length 6 Whole-Step Ascending B to C# on A string, slurred, notes equal length 7 Whole-Step Descending C# to B on A string, slurred, notes equal length 8 Whole-Step Ascending Open A to B, vibrato on second note 9 Whole-Step Descending B to Open A, slight vibrato on first note, second note allowed to ring 10 Half-Step Ascending B to C on A string, smooth, bow change, slight vibrato on both notes 11 Half-Step Ascending B to C, marcato, bow change 12 Half-Step Ascending B to C, separated bow change, accented 13 Half-Step Ascending B to C, slurred, no vibrato 14 Half-Step Ascending B to C, slurred, no vibrato 15 Half-Step Descending C to B, slurred, no vibrato 16 Half-Step Ascending B to C on the A string, slurred, notes are equal length
Transition #2
Transition #3
Transition #4
Transition #5
Transition #6
Transition #7
Transition #8
Transition #9
Transition #10
Transition #11
Transition #12
Transition #13
Transition #14
Transition #15
Transition #16
Transition #1
Transition #2
(Not shown. Clipping existed in the digital recording, so no analysis was attempted.)
Transition #3
Transition #4
Transition #5
Transition #6
Transition #7
Transition #8
Transition #9
Transition #10
Transition #11
Transition #12
Transition #13
Transition #14
Transition #15
Transition #16
[1] J. Strawn, "Modeling musical transitions," Ph.D. dissertation, Stanford University, Stanford, CA, 1985.
[2] J. T.Sciarabba, "Psychoacoustics in sound synthesis," M.S. thesis, University of Illinois at Urbana-Champaign, Urbana, IL, 1991.
[3] S.C. Glinski "Diphone speech synthesis based on pitch-adaptive short-time fourier transform," Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, IL, 1978.
[4] J. M. Grey, "An exploration of musical timbre," Ph.D. dissertation, Stanford University, Stanford, CA, 1975.
[5] J. M. Chowning, "The synthesis of complex audio spectra by means of frequency modulation," Journal of the Audio Engineering Society, vol. 21, no. 7, pp. 256-534, 1973.
[6] J. Woodhouse, "Physical modeling of bowed strings," Computer Music Journal, vol. 16, no. 4, pp. 43-56, Winter 1992.
[7] J. W. Beauchamp, "A computer system for time-variant harmonic analysis and synthesis of musical tones," in Music by Computers, Heinz von Foerster and James W. Beauchamp, Eds. New York: Wiley, 1969, pp. 19-62.
[8] J. W. Beauchamp, "Data reduction and resynthesis of connected solo passages using frequency, amplitude, and `brightness' detection and the nonlinear synthesis technique," in Proceedings of the 1981 International Computer Music Conference, Larry Austin and Thomas Clark, Eds. Denton, Texas: North Texas State University, pp. 316-323, 1981.
[9] J. Strawn, "Analysis and synthesis of musical transitions using the discrete short-time fourier transform," Journal of the Audio Engineering Society, vol. 35, no. 1/2, pp. 3-13, 1987.
[10] R.J. McAulay and T.F. Quatieri, "Speech analysis/synthesis based on a sinusoidal representation," IEEE Transactions on Acoustics, Speech and Signal Processing, vol. ASSP 34, no. 4, pp. 744-754, August 1986.
[11] K. Fitz, "A sinusoidal model for time- and frequency-scale modification in computer music," M.S. Thesis, University of Illinois at Urbana-Champaign, Urbana, IL, 1992.
[12] K. Fitz, Lemur Manual. CERL Sound Group, University of Illinois, 1993.
[13] J. G. Roederer, Introduction to the Physics and Psychoacoustics of Music, 2nd ed. Heidelberg: Springer-Verlag, 1975.
[14] R. T. Maher, "An approach for the separation of voices in composite musical signals," Ph.D. dissertation, University of Illinois at Urbana-Champaign, Urbana, IL, 1989.
[15] R. Maher and J. Beauchamp, "An investigation of vocal vibrato for synthesis," Applied Acoustics, vol. 30, pp. 219-245, 1990.
[16] S. Berkeley, QuickMQ Manual. Dartmouth College, 1994.
[17] J. A. Moorer and J. Grey, "Lexicon of Analyzed Tones," Computer Music Journal, vol. 2, no. 2, pp. 23-31, September 1978.