Sound Morphing Using Loris

Many kinds of sound transformations have been characterized as sound morphing. Generally, these transformations involve two (or more) source timbres and produce a sound that has timbral characteristics of all both sources.

We define sound morphing more narrowly to be a transformation between source timbres governed by a morphing envelope, bounded by 0 and 1, in which the original source timbres are obtained at the extrema of the morphing envelope. That is, one source sound is produced when the envelope has value 0, the other source is produced when the envelopes have value 1, and hybrid sounds are obtained from intermediate envelope values.

Sound morphing using traditional additive sound models is straightforward. For quasi-harmonic sounds, in which each harmonic is represented by a single sinsusoidal partial, the time-varying frequencies and amplitudes of the quasi-harmonic partials in the morphed sound can be obtained by a weighted interpolation of the time-varying frequencies and amplitudes of corresponding partials in the source sounds.

In Loris, though the process of partial construction is different, the morphing process is fundamentally similar. Sound morphing is achieved by interpolating the time-varying frequency, amplitude, and bandwidth (or noisiness) of corresponding partials obtained from analyses of the source sounds.

Before proceeding with this tutorial, make sure that you have understood the Loris sound modeling tutorial.

Establishing Partial Correspondences

The description of sounds as quasi-harmonic implies a natural correspondence between partials having the same harmonic number. For non-harmonic or polyphonic sounds, however, there may be no obvious correspondence between partials in the source sounds, or there may be many possible correspondences. Loris provides mechanisms for explicitly establishing correspondences between source partials.

Correspondences between partials in the source sounds are established by labeling the partials in the source sounds. Partials in each source sound are assigned unique identifiers, or labels, and partials having the same label are morphed by interpolating their frequency, amplitude, and bandwidth envelopes according to the morphing envelope.

The product of a morph is a new set of partials, consisting of a single partial for each label represented in any of the source sounds.

Channelization

Channelization is an automated process of labeling the partials in an analyzed sound. Partials can be labeled one by one, but if the sound has a known, simple frequency structure, an automated process is much more efficient.

For quasi-harmonic sounds, partials are most often labeled according to their harmonic number. The frequency spectrum is partitioned into non-overlapping channels having center frequencies that are harmonic (integer) multiples of the fundamental frequency, and each channel is identified by a unique label equal to its harmonic number. Each partial is assigned the label corresponding to the channel containing the greatest portion of its (the partial's) energy.

For example, if the fundamental frequency of a tone is 110 Hz, then channel 1 is centered at 110 Hz, then channel 2 is centered at 220 Hz, then channel 3 is centered at 330 Hz, and so on. The width of the channels is equal to their difference between their center frequencies, 110 Hz in this example, so each channel covers a region 55 Hz above and below its center frequency.

Since the fundamental frequency may vary over time, the center frequency of the channel is specified in Loris by a `LinearEnvelope`. To construct a LinearEnvelope having a constant value (like 110 Hz), you can add only a single breakpoint to the envelope, as in

```env = createLinearEnvelope();
linearEnvelope_insertBreakpoint( env, 0, 110 );
```

The envelope can be made more interesting, if necessary, by adding more breakpoints to track changes in the fundamental frequency of the sound. A spectrogram tool is sometimes useful for estimating changes in fundamental frequency.

Such an envelope is used as a reference frequency envelope in the channelization process. The `channelize` function in the Loris procedural interface uses a reference frequency envelope to define a full set of frequency channels and label a set of partials according to those channels.

```channelize( partials, env, 1 );
```

The third argument to `channelize` is the channel number whose center frequency is described by the reference frequency envelope (1 if it is the fundamental, 3 if it is the third harmonic, etc.). The other channels are constructed around all the other harmonics.

Note that, due to the symmetry of the frequency channels, there is a frequency region below half the frequency of the first channel that is not covered by any channel, and therefore it is possible that there will be very low-frequency partials that remain unlabeled after channelization. In practice, it is unusual to find any partials in this region, and is generally an indication of a poor choice of analysis or channelization parameters. The frequency floor parameter of the Loris analyzer can be used to ensure that no such low-frequency partials are constructed.

Automated Reference Envelope Construction

The reference (fundamental) frequency envelope for channelization can often be constructed automatically using a fundamental frequency tracker. In such cases, the `createF0Estimate` function in the Loris procedural interface can be used to construct the reference frequency envelope automatically.

```env = createF0Estimate( partials, 80, 140, 0.01 );
/* create an envelope that tracks the fundamental frequency
between 80 and 140 Hz, sampled every 10 ms
*/
```

`createF0Estimate` estimates the fundamental in a specified range of frequencies and constructs a reference frequency envelope from samples of this estimate at the specified time interval.

The second and third arguments to `createFreqReference` are the minimum and maximum frequencies for the reference envelope. Partials that stray outside this frequency range are not considered when searching for the longest partial.

The fourth argument is the interval (in seconds) at which the fundamental frequency is estimated.

Channelizing by Hand

The function `forEachPartial` in the Loris procedural interface uses a callback function to preform some operation on each partial in a `PartialList`.

For example, this function labels only partials that are active between 1  and 3 seconds with their harmonic number. The fundamental frequency is passed in by the `void *` parameter `data`.

```int labelHarmonic( Partial * p, void * data )
{
double fund = *(double *)data;
double freq;
int harmnum;

if ( partial_startTime( p ) < 3.0 &&
partial_endTime( p ) > 1.0 )
{
freq = partial_frequencyAt( p, 1.0 );
harmnum = (int)( (freq / fund) + .5 );
partial_setLabel( p, harmnum );
}
return 0;
}
```

The functions `partial_startTime` and `partial_endTime` are invoked to determine whether a partial is active between 1  and 3 seconds. `partial_frequencyAt` returns the interpolated frequency of a partial at the specified time (1 second here). `partial_setLabel` sets the label assigned to a partial.

To invoke this function on each partial in a `PartialList`, `labelHarmonic` is passed to `forEachPartial`, along with the fundamental frequency.

```double f0 = 125;

forEachPartial( partials, labelHarmonic, &f0 );
```

The first argument to `forEachPartial` is the partials to visit, and the second is the callback function that should be invoked on each partial. The final argument is the address of the callback function data. In this case, it is the address of a `double` variable storing the fundamental frequency. If no callback data is needed, 0 should be passed as the final argument.

The callback function should return 0 unless the iteration should stop. If the callback returns a non-zero value, that value is immediately returned by `forEachPartial`, and the callback is not invoked on any more partials. If the callback visits all the partials in a `PartialList`, returning zero each time, then `forEachPartial` returns zero.

There is also a function `forEachBreakpoint` that can be used to invoke a callback function on each breakpoint in a partial.

`forEachPartial` and `forEachBreakpoint` are powerful tools for analyzing and processing modeled sounds. With clever construction of callback functions and callback function data structures, you can effect a great variety of manipulations and transformations of Loris analysis data.

Distillation

The sound morphing algorithm in Loris requires that the partials in each source be labeled uniquely, that is, no two partials in a single source can have the same label.

Distillation is the process for enforcing this condition. All partials identified with a particular channel, and therefore having a common label, are fused, or distilled into a single partial, leaving at most one partial per frequency channel and label.

The `distill` function in the Loris procedural interface distills a set of labeled partials.

```distill( partials );
```

When the partials in a frequency channel do not overlap in time, then distillation is simply a process of linking partials end to end, and inserting silence between the endpoints.

When partials in a frequency channel overlap temporally, the strongest (i.e. having the most energy) of the overlapping partials is selected to construct the distilled partial. The energy in the weaker partials is absorbed as noise energy by the stronger partial.

Sifting

In some cases, the energy redistribution effected by the distiller is undesirable. In such cases, the partials can be sifted before distillation. Sifting is performed by the `sift` function in the Loris procedural interface.

```sift( partials );
```

In the sifting process, partials that would be rejected in distillation are assigned the label 0. These sifted partials can then be identified and treated sepearately or removed altogether (using the `removeLabeled` function in the Loris procedural interface), or they can simply be left unlabeled.

Collating

Distilling affects only labeled partials. Unlabeled partials (partials labeled 0) are not processed by the distiller, they are simply gathered together at the end of the collection, after all the distilled, labeled partials.

Collating is a process similar to distilling, but affecting only unlabeled partials. Collating joins non-overlapping unlabeled partials to produce the smallest-possible number of partials.

Collating helps reduce the amount of partial data exported, and can speed up some operations, but because only non-overlapping partials are joined, the samples generated by synthesizing the partials are the same before and after collating.

Collating is performed by the `collate` function in the Loris procedural interface.

```collate( partials );
```

Collated partials, like all unlabeled partials, do not participate in morphing.

Temporal Feature Alignment

Significant temporal features of the source sounds must be synchronized in order to achieve good morphing results. If related temporal features (such as the end of the attack, or the beginning of the release) occur at different times in two sounds, then a morph between those sounds will be unsatisfying.

Loris provides a dilation mechanism for non-uniformly expanding and contracting partials redistribute temporal events. For example, when morphing instrument tones, it is common to align the attack, sustain, and release portions of the source sounds by dilating or contracting those temporal regions.

Arguments to `dilate` are the partials to transform and two sets of time points: the times before the transformation and the times after the transformation.

For example, to stretch the middle part of the cat's meow (which is 1.2 seconds long), you might make the following declarations:

```/* time points for dilation */
double itimes[] = { 0, 0.25, 0.75, 1 };
double ttimes[] = { 0, .25, 2.75, 3 };
const int ntimes = 4;
```

and then invoke `dilate` this way

```dilate( partials, itimes, ttimes, ntimes );
```

The final argument is the size of the number of time points in each array (both arrays must be the same size of course).

Dilation can also be used to synchronize a sound morph with a visual sequence, such as a computer animation. Temporal features of the morphed sound must be aligned with visual events in the animation in order to make the relationship between sound and visuals believable or seemingly "natural".

Dilation can occur before or after distillation, but is an essential component in controlling the evolution of the morph.

Final Preparations

The sources in a sound morph need not be distilled using identical sets of frequency channels. However, large frequency sweeps will dominate other audible effects of the morph, so care must be taken to coordinate the frequency channels used in the distillation process. For example, quasi-harmonic sounds of different pitches may be pitch-aligned (shifted to a common pitch) before morphing:

```/* shift pitch up by 100 cents */
env = createLinearEnvelope();
linearEnvelope_insertBreakpoint( env, 0, 100 );
shiftPitch( partials, env );
```

Though the harmonic frequency structure described by the channelization process may not be a good representation of the frequency structure of a particular sound (as in the case of a non-harmonic bell sound for example), it may still yield good morphing results by labeling partials in such a way as to prevent dramatic frequency sweeps.

In some cases, the dramatic effect of a morph, and its apparent "realism" are enhanced by applying frequency or amplitude deformations that are synchronous with the evolution of the morph. This enhanced realism is particularly important when the sound morph is to be coupled with animation.

Performing the Morph

Labeled and distilled sets of partials are morphed by interpolating the envelopes of corresponding partials according to specified morphing envelopes. There are separate morphing envelopes controlling the interpolation of frequency, amplitude, and bandwidth (or noisiness), though in practice, the same envelope is often used to control all three parameters.

The morphing envelopes are represented by `LinearEnvelope`s in the Loris procedural interface. The following statements construct a morphing envelope that evolves from the first source to the second between 0.5 and 1.5 seconds.

```env = createLinearEnvelope();
linearEnvelope_insertBreakpoint( env, 0.5, 0 );
linearEnvelope_insertBreakpoint( env, 1.5, 1 );
```

The `morph` function in the Loris procedural interface performs the morph between two sets of prepared partials.

```morph( src0, src1, env, env, env, dest );
```

The first two arguments are the source partial sets in the morph. The next three arguments are the morphing envelopes for frequency, amplitude, and noisiness, respectively. (In general, it is reasonable to use the same envelope for amplitude and noisiness, and often for all three parameters.) The final argument is the `PartialList` that will store the morphed partials.

Partials in one distilled source that have no corresponding partial in the other source(s) are crossfaded according to the amplitude morphing envelope.

Source partials may also be unlabeled, or assigned the label 0, to indicate that they have no correspondence with other sources in the morph. All unlabeled partials in a morph are crossfaded according to the amplitude morphing envelope. Only the amplitude (not the frequency or bandwidth) of unlabeled partials is affected by the morph.

Morphing Example

This program performs a morph between a flute tone and a clarinet tone. The flute and the clarinet were analyzed previously, and there partials stored in Sound Description Interchange Format (SDIF) files. The partials are imported, channelized, distilled, pitch aligned, dilated, and morphed. The morphed partials are rendered and the samples exported to a new samples file.

```#include "loris.h"

#include <stdio.h>
#include <stdlib.h>
#include <strings.h>

int main( void )
{
#define BUFSZ (3*44100)
double samples[ BUFSZ ]; /* clarinet is about 3 seconds */
unsigned int N = 0;

PartialList * clar = createPartialList();
PartialList * flut = createPartialList();
LinearEnvelope * reference = 0;
LinearEnvelope * pitchenv = createLinearEnvelope();

LinearEnvelope * morphenv = createLinearEnvelope();
PartialList * mrph = createPartialList();

double flute_times[] = {0.4, 1.};
double clar_times[] = {0.2, 1.};
double tgt_times[] = {0.3, 1.2};

/* import the raw clarinet partials */
printf( "importing clarinet partials\n" );
importSdif( "clarinet.sdif", clar );

/* channelize and distill */
printf( "distilling\n" );
reference = createF0Estimate( clar, 350, 450, 0.01 );
channelize( clar, reference, 1 );
distill( clar );
destroyLinearEnvelope( reference );
reference = 0;

/* shift pitch of clarinet partials */
printf( "shifting pitch of clarinet partials down by 600 cents\n" );
linearEnvelope_insertBreakpoint( pitchenv, 0, -600 );
shiftPitch( clar, pitchenv );
destroyLinearEnvelope( pitchenv );
pitchenv = 0;

/* import the raw flute partials */
printf( "importing flute partials\n" );
importSdif( "flute.sdif", flut );

/* channelize and distill */
printf( "distilling\n" );
reference = createF0Estimate( flut, 250, 320, 0.01 );
channelize( flut, reference, 1 );
distill( flut );
destroyLinearEnvelope( reference );
reference = 0;

/* align onsets */
printf( "dilating sounds to align onsets\n" );
dilate( clar, clar_times, tgt_times, 2 );
dilate( flut, flute_times, tgt_times, 2 );

/* perform morph */
printf( "morphing clarinet with flute\n" );
linearEnvelope_insertBreakpoint( morphenv, 0.6, 0 );
linearEnvelope_insertBreakpoint( morphenv, 2, 1 );
morph( clar, flut, morphenv, morphenv, morphenv, mrph );

/* synthesize and export samples */
printf( "synthesizing %lu morphed partials\n",
partialList_size( mrph ) );
N = synthesize( mrph, samples, BUFSZ, 44100 );
exportAiff( "morph.aiff", samples, N, 44100, 16 );

/* cleanup */
destroyPartialList( mrph );
destroyPartialList( clar );
destroyPartialList( flut );
destroyLinearEnvelope( morphenv );

printf( "Done, bye.\n\n" );
return 0;
}
``` This page is valid XHTML 1.1.