Sound Morphing Using Loris

Note: your browser is not CSS compliant, so it may format this page incorrectly.

Many kinds of sound transformations have been characterized as sound morphing. Generally, these transformations involve two (or more) source timbres and produce a sound that has timbral characteristics of all both sources.

We define sound morphing more narrowly to be a transformation between source timbres governed by a morphing envelope, bounded by 0 and 1, in which the original source timbres are obtained at the extrema of the morphing envelope. That is, one source sound is produced when the envelope has value 0, the other source is produced when the envelopes have value 1, and hybrid sounds are obtained from intermediate envelope values.

Sound morphing using traditional additive sound models is straightforward. For quasi-harmonic sounds, in which each harmonic is represented by a single sinsusoidal partial, the time-varying frequencies and amplitudes of the quasi-harmonic partials in the morphed sound can be obtained by a weighted interpolation of the time-varying frequencies and amplitudes of corresponding partials in the source sounds.

In Loris, though the process of partial construction is different, the morphing process is fundamentally similar. Sound morphing is achieved by interpolating the time-varying frequency, amplitude, and bandwidth (or noisiness) of corresponding partials obtained from analyses of the source sounds.

Before proceeding with this tutorial, make sure that you have understood the Loris sound modeling tutorial.

Establishing Partial Correspondences

The description of sounds as quasi-harmonic implies a natural correspondence between partials having the same harmonic number. For non-harmonic or polyphonic sounds, however, there may be no obvious correspondence between partials in the source sounds, or there may be many possible correspondences. Loris provides mechanisms for explicitly establishing correspondences between source partials.

Correspondences between partials in the source sounds are established by labeling the partials in the source sounds. Partials in each source sound are assigned unique identifiers, or labels, and partials having the same label are morphed by interpolating their frequency, amplitude, and bandwidth envelopes according to the morphing envelope.

The product of a morph is a new set of partials, consisting of a single partial for each label represented in any of the source sounds.

Channelization

Channelization is an automated process of labeling the partials in an analyzed sound. Partials can be labeled one by one, but if the sound has a known, simple frequency structure, an automated process is much more efficient.

For quasi-harmonic sounds, partials are most often labeled according to their harmonic number. The frequency spectrum is partitioned into non-overlapping channels having center frequencies that are harmonic (integer) multiples of the fundamental frequency, and each channel is identified by a unique label equal to its harmonic number. Each partial is assigned the label corresponding to the channel containing the greatest portion of its (the partial's) energy.

For example, if the fundamental frequency of a tone is 110 Hz, then channel 1 is centered at 110 Hz, then channel 2 is centered at 220 Hz, then channel 3 is centered at 330 Hz, and so on. The width of the channels is equal to their difference between their center frequencies, 110 Hz in this example, so each channel covers a region 55 Hz above and below its center frequency.

Since the fundamental frequency may vary over time, the center frequency of the channel is specified in Loris by a LinearEnvelope. To construct a LinearEnvelope having a constant value (like 110 Hz), you can add only a single breakpoint to the envelope, as in

env = loris.LinearEnvelope();
env.insertBreakpoint( 0, 110 );

The envelope can be made more interesting, if necessary, by adding more breakpoints to track changes in the fundamental frequency of the sound. A spectrogram tool is sometimes useful for estimating changes in fundamental frequency.

Such an envelope is used as a reference frequency envelope in the channelization process. The channelize function in the Loris procedural interface uses a reference frequency envelope to define a full set of frequency channels and label a set of partials according to those channels.

loris.channelize( partials, env, 1 );

The third argument to channelize is the channel number whose center frequency is described by the reference frequency envelope (1 if it is the fundamental, 3 if it is the third harmonic, etc.). The other channels are constructed around all the other harmonics.

Note that, due to the symmetry of the frequency channels, there is a frequency region below half the frequency of the first channel that is not covered by any channel, and therefore it is possible that there will be very low-frequency partials that remain unlabeled after channelization. In practice, it is unusual to find any partials in this region, and is generally an indication of a poor choice of analysis or channelization parameters. The frequency floor parameter of the Loris analyzer can be used to ensure that no such low-frequency partials are constructed.

Automated Reference Envelope Construction

The reference (fundamental) frequency envelope for channelization can often be constructed automatically using a fundamental frequency tracker. In such cases, the createF0Estimate function in the Loris procedural interface can be used to construct the reference frequency envelope automatically.

env = loris.createF0Estimate( partials, 80, 140, 0.01 )
# create an envelope that tracks the fundamental frequency
# between 80 and 140 Hz, sampled every 10 ms

createF0Estimate estimates the fundamental in a specified range of frequencies and constructs a reference frequency envelope from samples of this estimate at the specified time interval.

The second and third arguments to createFreqReference are the minimum and maximum frequencies for the reference envelope. Partials that stray outside this frequency range are not considered when searching for the longest partial.

The fourth argument is the interval (in seconds) at which the fundamental frequency is estimated.

Channelizing by Hand

You can use methods in the Partial and PartialList classes to perform operations on Loris analysis data.

For example, this function labels only partials that are active between 1 and 3 seconds with their harmonic number.

def labelHarmonic( partial, f0 ):
    if partial.startTime() < 3.0 and partial.endTime() > 1.0:
        freq = partial.frequencyAt( 1.0 )
      	harmnum = int( round( freq / f0 ) )
      	partial.setLabel( harmnum )

The methods startTime and endTime are invoked to determine whether a partial is active between 1 and 3 seconds. frequencyAt returns the interpolated frequency of a Partial at the specified time (1 second here). setLabel sets the label assigned to a partial.

Invoke this labelHarmonic function on each partial in a PartialList (such as the result of an analysis, or data imported from a SDIF file), passing a single partial along with the fundamental frequency estimate.

f0 = 125
for p in partials:
    labelHarmonic( p, f0 )

PartialLists are iterable like other Python sequences. Similarly, Partials are iterable sequences of Breakpoints:

for bp in aPartial:
    # do something with each Breakpoint
    ...

Using similar techniques, you can effect a great variety of manipulations and transformations of Loris analysis data.

Distillation

The sound morphing algorithm in Loris requires that the partials in each source be labeled uniquely, that is, no two partials in a single source can have the same label.

Distillation is the process for enforcing this condition. All partials identified with a particular channel, and therefore having a common label, are fused, or distilled into a single partial, leaving at most one partial per frequency channel and label.

The distill function in the Loris module distills a PartialList.

loris.distill( partials )

When the partials in a frequency channel do not overlap in time, then distillation is simply a process of linking partials end to end, and inserting silence between the endpoints.

When partials in a frequency channel overlap temporally, the strongest (i.e. having the most energy) of the overlapping partials is selected to construct the distilled partial. The energy in the weaker partials is absorbed as noise energy by the stronger partial.

Sifting

In some cases, the energy redistribution effected by the distiller is undesirable. In such cases, the partials can be sifted before distillation. Sifting is performed by the sift function in the Loris module.

loris.sift( partials )

In the sifting process, partials that would be rejected in distillation are assigned the label 0. These sifted partials can then be identified and treated sepearately or removed altogether (using the removeLabeled or extractLabeled function), or they can simply be left unlabeled.

Collating

Distilling affects only labeled partials. Unlabeled partials (partials labeled 0) are not processed by the distiller, they are simply gathered together at the end of the collection, after all the distilled, labeled partials.

Collating is a process similar to distilling, but affecting only unlabeled partials. Collating joins non-overlapping unlabeled partials to produce the smallest-possible number of partials.

Collating helps reduce the amount of partial data exported, and can speed up some operations, but because only non-overlapping partials are joined, the samples generated by synthesizing the partials are the same before and after collating.

Collating is performed by the collate function in the Loris module.

loris.collate( partials )

Collated partials, like all unlabeled partials, do not participate in morphing.

Temporal Feature Alignment

Significant temporal features of the source sounds must be synchronized in order to achieve good morphing results. If related temporal features (such as the end of the attack, or the beginning of the release) occur at different times in two sounds, then a morph between those sounds will be unsatisfying.

Loris provides a dilation mechanism for non-uniformly expanding and contracting partials redistribute temporal events. For example, when morphing instrument tones, it is common to align the attack, sustain, and release portions of the source sounds by dilating or contracting those temporal regions.

Arguments to dilate are the partials to transform and two sets of time points: the times before the transformation and the times after the transformation.

For example, to stretch the middle part of the cat's meow (which is 1.2 seconds long), you might define the following sequences:

# time points for dilation
itimes = [ 0, 0.25, 0.75, 1 ]
ttimes = [ 0, .25, 2.75, 3 ]

(both sequences must be the same size of course) and then invoke dilate this way

loris.dilate( partials, itimes, ttimes );

Dilation can also be used to synchronize a sound morph with a visual sequence, such as a computer animation. Temporal features of the morphed sound must be aligned with visual events in the animation in order to make the relationship between sound and visuals believable or seemingly "natural".

Dilation can occur before or after distillation, but is an essential component in controlling the evolution of the morph.

Final Preparations

The sources in a sound morph need not be distilled using identical sets of frequency channels. However, large frequency sweeps will dominate other audible effects of the morph, so care must be taken to coordinate the frequency channels used in the distillation process. For example, quasi-harmonic sounds of different pitches may be pitch-aligned (shifted to a common pitch) before morphing:

# shift pitch up by 100 cents
loris.shiftPitch( partials, 100 )

Though the harmonic frequency structure described by the channelization process may not be a good representation of the frequency structure of a particular sound (as in the case of a non-harmonic bell sound for example), it may still yield good morphing results by labeling partials in such a way as to prevent dramatic frequency sweeps.

In some cases, the dramatic effect of a morph, and its apparent "realism" are enhanced by applying frequency or amplitude deformations that are synchronous with the evolution of the morph. This enhanced realism is particularly important when the sound morph is to be coupled with animation.

Performing the Morph

Labeled and distilled sets of partials are morphed by interpolating the envelopes of corresponding partials according to specified morphing envelopes. There are separate morphing envelopes controlling the interpolation of frequency, amplitude, and bandwidth (or noisiness), though in practice, the same envelope is often used to control all three parameters.

The morphing envelopes are represented by LinearEnvelopes in the Loris Python module. The following statements construct a morphing envelope that evolves from the first source to the second between 0.5 and 1.5 seconds.

env = createLinearEnvelope();
linearEnvelope_insertBreakpoint( env, 0.5, 0 );
linearEnvelope_insertBreakpoint( env, 1.5, 1 );

The morph function in the Loris module performs the morph between two sets of prepared partials (PartialLists) and returns a new PartialList containing the morphed partials.

morphed_partials = morph( src0, src1, env, env, env )

The first two arguments are the source partial sets in the morph. The next three arguments are the morphing envelopes for frequency, amplitude, and noisiness, respectively. In general, it is reasonable to use the same envelope for amplitude and noisiness, and often for all three parameters.

Partials in one distilled source that have no corresponding partial in the other source(s) are crossfaded according to the amplitude morphing envelope.

Source partials may also be unlabeled, or assigned the label 0, to indicate that they have no correspondence with other sources in the morph. All unlabeled partials in a morph are crossfaded according to the amplitude morphing envelope. Only the amplitude (not the frequency or bandwidth) of unlabeled partials is affected by the morph.

Morphing Example

This program performs a morph between a flute tone and a clarinet tone. The flute and the clarinet were analyzed previously, and there partials stored in Sound Description Interchange Format (SDIF) files. The partials are imported, channelized, distilled, pitch aligned, dilated, and morphed. The morphed partials are rendered and the samples exported to a new samples file.

import loris

# import the raw clarinet partials
clar = loris.importSdif( 'clarinet.sdif' )

# channelize and distill, assume 415 Hz fundamental
loris.channelize( clar, 415 )
loris.distill( clar )

# shift pitch of clarinet partials by 600 cents
loris.shiftPitch( clar, -600 )

# import the raw flute partials
flut = loris.importSdif( 'flute.sdif' )

# channelize and distill, track and estimate fundamental
refenv = loris.createF0Estimate( flut, 291*.8, 291*1.2, 0.01 )
loris.channelize( flut, refenv, 1 )
loris.distill( flut )

# perform temporal dilation, align onsets
flute_times = [0.4, 1.]
clar_times = [0.2, 1.]
tgt_times = [0.3, 1.2]

print 'dilating sounds to to align onsets'
loris.dilate( flut, flute_times, tgt_times )
loris.dilate( clar, clar_times, tgt_times )

# perform morph
print 'morphing clarinet with flute'
morphenv = loris.LinearEnvelope()
morphenv.insertBreakpoint( 0.6, 0 )
morphenv.insertBreakpoint( 2, 1 )
mrph = loris.morph( clar, flut, morphenv, morphenv, morphenv )

# synthesize and export samples
samples = loris.synthesize( mrph, 44100 ), 
loris.exportAiff( 'morph.aiff', samples, 44100 )
                  
print 'Done, bye.'

Download

You can download the Python code for this example, or just cut and paste it into a text editor.

You can download the Loris software package from our project website at SourceForge .

Return to the Loris homepage.