Sound Morphing Using Loris
Note: your browser is not CSS compliant, so it may format this page incorrectly.
Many kinds of sound transformations have been characterized as sound morphing. Generally, these transformations involve two (or more) source timbres and produce a sound that has timbral characteristics of all both sources.
We define sound morphing more narrowly to be a transformation between source timbres governed by a morphing envelope, bounded by 0 and 1, in which the original source timbres are obtained at the extrema of the morphing envelope. That is, one source sound is produced when the envelope has value 0, the other source is produced when the envelopes have value 1, and hybrid sounds are obtained from intermediate envelope values.
Sound morphing using traditional additive sound models is straightforward. For quasi-harmonic sounds, in which each harmonic is represented by a single sinsusoidal partial, the time-varying frequencies and amplitudes of the quasi-harmonic partials in the morphed sound can be obtained by a weighted interpolation of the time-varying frequencies and amplitudes of corresponding partials in the source sounds.
In Loris, though the process of partial construction is different, the morphing process is fundamentally similar. Sound morphing is achieved by interpolating the time-varying frequency, amplitude, and bandwidth (or noisiness) of corresponding partials obtained from analyses of the source sounds.
Before proceeding with this tutorial, make sure that you have understood the Loris sound modeling tutorial.
Establishing Partial Correspondences
The description of sounds as quasi-harmonic implies a natural correspondence between partials having the same harmonic number. For non-harmonic or polyphonic sounds, however, there may be no obvious correspondence between partials in the source sounds, or there may be many possible correspondences. Loris provides mechanisms for explicitly establishing correspondences between source partials.
Correspondences between partials in the source sounds are established by labeling the partials in the source sounds. Partials in each source sound are assigned unique identifiers, or labels, and partials having the same label are morphed by interpolating their frequency, amplitude, and bandwidth envelopes according to the morphing envelope.
The product of a morph is a new set of partials, consisting of a single partial for each label represented in any of the source sounds.
Channelization
Channelization is an automated process of labeling the partials in an analyzed sound. Partials can be labeled one by one, but if the sound has a known, simple frequency structure, an automated process is much more efficient.
For quasi-harmonic sounds, partials are most often labeled according to their harmonic number. The frequency spectrum is partitioned into non-overlapping channels having center frequencies that are harmonic (integer) multiples of the fundamental frequency, and each channel is identified by a unique label equal to its harmonic number. Each partial is assigned the label corresponding to the channel containing the greatest portion of its (the partial's) energy.
For example, if the fundamental frequency of a tone is 110 Hz, then channel 1 is centered at 110 Hz, then channel 2 is centered at 220 Hz, then channel 3 is centered at 330 Hz, and so on. The width of the channels is equal to their difference between their center frequencies, 110 Hz in this example, so each channel covers a region 55 Hz above and below its center frequency.
Since the fundamental frequency may vary over time, the
center frequency of the channel is specified in
Loris by a LinearEnvelope
.
To construct a LinearEnvelope having a constant
value (like 110 Hz), you can add only a single
breakpoint to the envelope, as in
env = loris.LinearEnvelope(); env.insertBreakpoint( 0, 110 );
The envelope can be made more interesting, if necessary, by adding more breakpoints to track changes in the fundamental frequency of the sound. A spectrogram tool is sometimes useful for estimating changes in fundamental frequency.
Such an envelope is used as a reference frequency
envelope in the channelization process. The channelize
function in the Loris procedural interface
uses a reference frequency envelope to define a full set
of frequency channels and label a set of partials
according to those channels.
loris.channelize( partials, env, 1 );
The third argument to channelize
is the channel number
whose center frequency is described by the reference frequency
envelope (1 if it is the fundamental, 3 if it is the third harmonic, etc.).
The other channels are constructed around all the other harmonics.
Note that, due to the symmetry of the frequency channels, there is a frequency region below half the frequency of the first channel that is not covered by any channel, and therefore it is possible that there will be very low-frequency partials that remain unlabeled after channelization. In practice, it is unusual to find any partials in this region, and is generally an indication of a poor choice of analysis or channelization parameters. The frequency floor parameter of the Loris analyzer can be used to ensure that no such low-frequency partials are constructed.
Automated Reference Envelope Construction
The reference (fundamental) frequency envelope for
channelization can often be constructed automatically
using a fundamental frequency tracker.
In such cases, the createF0Estimate
function in the Loris procedural interface can
be used to construct the reference frequency envelope automatically.
env = loris.createF0Estimate( partials, 80, 140, 0.01 ) # create an envelope that tracks the fundamental frequency # between 80 and 140 Hz, sampled every 10 ms
createF0Estimate
estimates the fundamental
in a specified range of frequencies and
constructs a reference frequency envelope from samples of
this estimate at the specified time interval.
The second and third arguments to createFreqReference
are the minimum and maximum frequencies for the reference envelope.
Partials that stray outside this frequency range are not considered when
searching for the longest partial.
The fourth argument is the interval (in seconds) at which the fundamental frequency is estimated.
Channelizing by Hand
You can use methods in the Partial and PartialList classes to perform operations on Loris analysis data.
For example, this function labels only partials that are active between 1 and 3 seconds with their harmonic number.
def labelHarmonic( partial, f0 ): if partial.startTime() < 3.0 and partial.endTime() > 1.0: freq = partial.frequencyAt( 1.0 ) harmnum = int( round( freq / f0 ) ) partial.setLabel( harmnum )
The methods startTime
and
endTime
are invoked to determine
whether a partial is active between 1 and
3 seconds. frequencyAt
returns the interpolated frequency of a Partial
at the specified time (1 second here).
setLabel
sets the label
assigned to a partial.
Invoke this labelHarmonic
function on
each partial in a
PartialList
(such as the result
of an analysis, or data imported from a SDIF file),
passing a single partial along with
the fundamental frequency estimate.
f0 = 125 for p in partials: labelHarmonic( p, f0 )
PartialList
s are iterable like other Python sequences.
Similarly, Partial
s are iterable sequences of
Breakpoints
:
for bp in aPartial: # do something with each Breakpoint ...
Using similar techniques, you can effect a great variety of manipulations and transformations of Loris analysis data.
Distillation
The sound morphing algorithm in Loris requires that the partials in each source be labeled uniquely, that is, no two partials in a single source can have the same label.
Distillation is the process for enforcing this condition. All partials identified with a particular channel, and therefore having a common label, are fused, or distilled into a single partial, leaving at most one partial per frequency channel and label.
The distill
function in the Loris
module distills a PartialList
.
loris.distill( partials )
When the partials in a frequency channel do not overlap in time, then distillation is simply a process of linking partials end to end, and inserting silence between the endpoints.
When partials in a frequency channel overlap temporally, the strongest (i.e. having the most energy) of the overlapping partials is selected to construct the distilled partial. The energy in the weaker partials is absorbed as noise energy by the stronger partial.
Sifting
In some cases, the energy redistribution effected by the distiller
is undesirable. In such cases, the partials can be sifted
before distillation. Sifting is performed by the sift
function in the Loris module.
loris.sift( partials )
In the sifting process, partials that would be rejected
in distillation are assigned the label 0. These
sifted partials can then be identified and treated
sepearately or removed altogether (using the
removeLabeled
or extractLabeled
function), or they can simply be left unlabeled.
Collating
Distilling affects only labeled partials. Unlabeled partials (partials labeled 0) are not processed by the distiller, they are simply gathered together at the end of the collection, after all the distilled, labeled partials.
Collating is a process similar to distilling, but affecting only unlabeled partials. Collating joins non-overlapping unlabeled partials to produce the smallest-possible number of partials.
Collating helps reduce the amount of partial data exported, and can speed up some operations, but because only non-overlapping partials are joined, the samples generated by synthesizing the partials are the same before and after collating.
Collating is performed by the collate
function in the Loris module.
loris.collate( partials )
Collated partials, like all unlabeled partials, do not participate in morphing.
Temporal Feature Alignment
Significant temporal features of the source sounds must be synchronized in order to achieve good morphing results. If related temporal features (such as the end of the attack, or the beginning of the release) occur at different times in two sounds, then a morph between those sounds will be unsatisfying.
Loris provides a dilation mechanism for non-uniformly expanding and contracting partials redistribute temporal events. For example, when morphing instrument tones, it is common to align the attack, sustain, and release portions of the source sounds by dilating or contracting those temporal regions.
Arguments to dilate
are the partials to transform
and two sets of time points: the times before the transformation
and the times after the transformation.
For example, to stretch the middle part of the cat's meow (which is 1.2 seconds long), you might define the following sequences:
# time points for dilation itimes = [ 0, 0.25, 0.75, 1 ] ttimes = [ 0, .25, 2.75, 3 ]
(both sequences must be the same size of course)
and then invoke dilate
this way
loris.dilate( partials, itimes, ttimes );
Dilation can also be used to synchronize a sound morph with a visual sequence, such as a computer animation. Temporal features of the morphed sound must be aligned with visual events in the animation in order to make the relationship between sound and visuals believable or seemingly "natural".
Dilation can occur before or after distillation, but is an essential component in controlling the evolution of the morph.
Final Preparations
The sources in a sound morph need not be distilled using identical sets of frequency channels. However, large frequency sweeps will dominate other audible effects of the morph, so care must be taken to coordinate the frequency channels used in the distillation process. For example, quasi-harmonic sounds of different pitches may be pitch-aligned (shifted to a common pitch) before morphing:
# shift pitch up by 100 cents loris.shiftPitch( partials, 100 )
Though the harmonic frequency structure described by the channelization process may not be a good representation of the frequency structure of a particular sound (as in the case of a non-harmonic bell sound for example), it may still yield good morphing results by labeling partials in such a way as to prevent dramatic frequency sweeps.
In some cases, the dramatic effect of a morph, and its apparent "realism" are enhanced by applying frequency or amplitude deformations that are synchronous with the evolution of the morph. This enhanced realism is particularly important when the sound morph is to be coupled with animation.
Performing the Morph
Labeled and distilled sets of partials are morphed by interpolating the envelopes of corresponding partials according to specified morphing envelopes. There are separate morphing envelopes controlling the interpolation of frequency, amplitude, and bandwidth (or noisiness), though in practice, the same envelope is often used to control all three parameters.
The morphing envelopes are represented by
LinearEnvelope
s in the Loris
Python module. The following statements construct
a morphing envelope that evolves from the first
source to the second between 0.5 and 1.5 seconds.
env = createLinearEnvelope(); linearEnvelope_insertBreakpoint( env, 0.5, 0 ); linearEnvelope_insertBreakpoint( env, 1.5, 1 );
The morph
function in the Loris
module performs the morph between two sets
of prepared partials (PartialList
s) and
returns a new PartialList
containing
the morphed partials.
morphed_partials = morph( src0, src1, env, env, env )
The first two arguments are the source partial sets in the morph. The next three arguments are the morphing envelopes for frequency, amplitude, and noisiness, respectively. In general, it is reasonable to use the same envelope for amplitude and noisiness, and often for all three parameters.
Partials in one distilled source that have no corresponding partial in the other source(s) are crossfaded according to the amplitude morphing envelope.
Source partials may also be unlabeled, or assigned the label 0, to indicate that they have no correspondence with other sources in the morph. All unlabeled partials in a morph are crossfaded according to the amplitude morphing envelope. Only the amplitude (not the frequency or bandwidth) of unlabeled partials is affected by the morph.
Morphing Example
This program performs a morph between a flute tone and a clarinet tone. The flute and the clarinet were analyzed previously, and there partials stored in Sound Description Interchange Format (SDIF) files. The partials are imported, channelized, distilled, pitch aligned, dilated, and morphed. The morphed partials are rendered and the samples exported to a new samples file.
import loris # import the raw clarinet partials clar = loris.importSdif( 'clarinet.sdif' ) # channelize and distill, assume 415 Hz fundamental loris.channelize( clar, 415 ) loris.distill( clar ) # shift pitch of clarinet partials by 600 cents loris.shiftPitch( clar, -600 ) # import the raw flute partials flut = loris.importSdif( 'flute.sdif' ) # channelize and distill, track and estimate fundamental refenv = loris.createF0Estimate( flut, 291*.8, 291*1.2, 0.01 ) loris.channelize( flut, refenv, 1 ) loris.distill( flut ) # perform temporal dilation, align onsets flute_times = [0.4, 1.] clar_times = [0.2, 1.] tgt_times = [0.3, 1.2] print 'dilating sounds to to align onsets' loris.dilate( flut, flute_times, tgt_times ) loris.dilate( clar, clar_times, tgt_times ) # perform morph print 'morphing clarinet with flute' morphenv = loris.LinearEnvelope() morphenv.insertBreakpoint( 0.6, 0 ) morphenv.insertBreakpoint( 2, 1 ) mrph = loris.morph( clar, flut, morphenv, morphenv, morphenv ) # synthesize and export samples samples = loris.synthesize( mrph, 44100 ), loris.exportAiff( 'morph.aiff', samples, 44100 ) print 'Done, bye.'
Download
You can download the Python code for this example, or just cut and paste it into a text editor.
You can download the Loris software package from our project website at SourceForge .