Movatterモバイル変換

[0]ホーム

Jump to content

Audio time stretching and pitch scaling

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromAudio timescale-pitch modification)

Changing the speed or duration of an audio signal without affecting its pitch

"Timestretch" redirects here. For the album, seeTimestretch (album).

Time stretching is the process of changing the speed or duration of anaudio signal without affecting itspitch.Pitch scaling is the opposite: the process of changing the pitch without affecting the speed.Pitch shift is pitch scaling implemented in aneffects unit and intended for live performance.Pitch control is a simpler process which affects pitch and speed simultaneously by slowing down or speeding up a recording.

These processes are often used to match the pitches and tempos of two pre-recorded clips for mixing when the clips cannot be reperformed or resampled. Time stretching is often used to adjustradio commercials^[1] and the audio oftelevision advertisements^[2] to fit exactly into the 30 or 60 seconds available. It can be used to conform longer material to a designated time slot, such as a 1-hour broadcast.

Resampling

[edit]

The simplest way to change the duration or pitch of an audio recording is to change the playback speed. For adigital audio recording, this can be accomplished throughsample rate conversion. When using this method, the frequencies in the recording are always scaled at the same ratio as the speed,transposing its pitch up or down in the process. Slowing down the recording to increase duration also lowers the pitch, while speeding it up for a shorter duration respectively raises the pitch, creating the so-calledChipmunk effect. When resampling audio to a notably lower pitch, it may be preferred that the source audio is of a higher sample rate, as slowing down the playback rate will reproduce an audio signal of a lower resolution, and therefore reduce the perceived clarity of the sound. When resampling audio to a notably higher pitch, it may be preferred to incorporate an interpolation filter, as frequencies that surpass theNyquist frequency (determined by the sampling rate of the audio reproduction software or device) will create sound distortions due toaliasing.

Frequency domain

[edit]

Phase vocoder

[edit]

Main article:Phase vocoder

One way of stretching the length of a signal without affecting the pitch is to build aphase vocoder after Flanagan, Golden, and Portnoff.

Basic steps:

compute the instantaneous frequency/amplitude relationship of the signal using theSTFT, which is thediscrete Fourier transform of a short, overlapping and smoothly windowed block of samples;
apply some processing to the Fourier transform magnitudes and phases (like resampling the FFT blocks); and
perform an inverse STFT by taking the inverse Fourier transform on each chunk and adding the resulting waveform chunks, also called overlap and add (OLA).^[3]

The phase vocoder handlessinusoid components well, but early implementations introduced considerable smearing ontransient ("beat") waveforms at all non-integer compression/expansion rates, which renders the results phasey and diffuse. Recent improvements allow better quality results at all compression/expansion ratios but a residual smearing effect still remains.

The phase vocoder technique can also be used to perform pitch shifting, chorusing, timbre manipulation, harmonizing, and other unusual modifications, all of which can be changed as a function of time.

Sinusoidal analysis/synthesis system (based onMcAulay & Quatieri 1988, p. 161)^[4]

Sinusoidal spectral modeling

[edit]

Time domain

[edit]

SOLA

[edit]

Frame-based approach

[edit]

In order to preserve an audio signal's pitch when stretching or compressing its duration, many time-scale modification (TSM) procedures follow a frame-based approach.^[6]Given an original discrete-time audio signal, this strategy's first step is to split the signal into shortanalysis frames of fixed length.The analysis frames are spaced by a fixed number of samples, called theanalysis hopsize $H_{a}\in \mathbb {N}$ .To achieve the actual time-scale modification, the analysis frames are then temporally relocatedto have asynthesis hopsize $H_{s}\in \mathbb {N}$ .This frame relocation results in a modification of the signal's duration by astretching factor of $\alpha =H_{s}/H_{a}$ .However, simply superimposing the unmodified analysis frames typically results in undesired artifactssuch as phase discontinuities or amplitude fluctuations.To prevent these kinds of artifacts, the analysis frames are adapted to formsynthesis frames, prior tothe reconstruction of the time-scale modified output signal.

The strategy of how to derive the synthesis frames from the analysis frames is a key difference amongdifferent TSM procedures.

Speed hearing and speed talking

[edit]

For the specific case of speech, time stretching can be performed usingPSOLA.

Time-compressed speech is the representation of verbal text in compressed time. While one might expect speeding up to reduce comprehension, Herb Friedman says that "Experiments have shown that the brain works most efficiently if the information rate through the ears—via speech—is the 'average' reading rate, which is about 200–300 wpm (words per minute), yet the average rate of speech is in the neighborhood of 100–150 wpm."^[7]

Listening to time-compressed speech is seen as the equivalent ofspeed reading.^{[by whom?]}^[8]^[9]

Pitch scaling

[edit]

Pitch shifting (frequency scaling) is provided onEventide Harmonizer

Frequency shifting provided byBode Frequency Shifterdoes not keep frequency ratio and harmony.

These techniques can also be used totranspose an audio sample while holding speed or duration constant. This may be accomplished by time stretching and then resampling back to the original length. Alternatively, the frequency of the sinusoids in asinusoidal model may be altered directly, and the signal reconstructed at the appropriate time scale.

Transposing can be calledfrequency scaling orpitch shifting, depending on perspective.

For example, one could move the pitch of every note up by a perfect fifth, keeping the tempo the same.One can view this transposition as "pitch shifting", "shifting" each note up 7 keys on a piano keyboard, or adding a fixed amount on theMel scale, or adding a fixed amount in linearpitch space.One can view the same transposition as "frequency scaling", "scaling" (multiplying) the frequency of every note by 3/2.

Musical transposition preserves the ratios of theharmonic frequencies that determine the sound'stimbre, unlike thefrequency shift performed byamplitude modulation, which adds a fixed frequency offset to the frequency of every note. (In theory one could perform a literalpitch scaling in which the musical pitch space location is scaled [a higher note would be shifted at a greater interval in linear pitch space than a lower note], but that is highly unusual, and not musical.^{[citation needed]})

Time domain processing works much better here, as smearing is less noticeable, but scaling vocal samples distorts theformants into a sort ofAlvin and the Chipmunks-like effect, which may be desirable or undesirable. A process that preserves the formants and character of a voice involves analyzing the signal with achannel vocoder orLPC vocoder plus any of severalpitch detection algorithms and then resynthesizing it at a different fundamental frequency.

A detailed description of older analog recording techniques for pitch shifting can be found atAlvin and the Chipmunks § Recording technique.

DJing

[edit]

Time stretching and pitch scaling is used extensively byDJs in addition tobeatmixing when playing and creatingset. In order to seamlessly blend two tracks together, the tempo of a track can be adjusted to match another track such that the beats line up. Pitch scaling is commonly used to retain the pitch of a track. Pitch scaling is also used by DJs forharmonic mixing, to transform tracks into compatible keys so that they sound pleasing when mixed together. Time stretching and pitch scaling are included in modern DJ hardware (CDJs andDJ controllers) and software (such asVirtualDJ,Mixxx,Serato and Rekordbox).

Music production

[edit]

Time stretching and pitch scaling is used indigital audio workstation software for working withmusic loops, sound clips which can be repeated and transposed to form a song. The pitch and tempo of multiple loops are aligned to create tracks. Notable software includesAcid Pro with its "Acidized" loops feature andFL Studio.

In consumer software

[edit]

Pitch-corrected audio timestretch is found in every modernweb browser as part of theHTML standard for media playback.^[10] Similar controls are ubiquitous in media applications and frameworks such asGStreamer andUnity.

References

[edit]

^"Dolby, The Chipmunks And NAB2004". Archived fromthe original on 2008-05-27.{{cite magazine}}:Cite magazine requires|magazine= (help)
^"Variable speech".www.atarimagazines.com.
^Jont B. Allen (June 1977). "Short Time Spectral Analysis, Synthesis, and Modification by Discrete Fourier Transform".IEEE Transactions on Acoustics, Speech, and Signal Processing. ASSP-25 (3):235–238.
^McAulay, R. J.;Quatieri, T. F. (1988),"Speech Processing Based on a Sinusoidal Model"(PDF),The Lincoln Laboratory Journal,1 (2):153–167, archived fromthe original(PDF) on 2012-05-21, retrieved2014-09-07
^David Malah (April 1979). "Time-domain algorithms for harmonic bandwidth reduction and time scaling of speech signals".IEEE Transactions on Acoustics, Speech, and Signal Processing. ASSP-27 (2):121–133.
^Jonathan Driedger and Meinard Müller (2016)."A Review of Time-Scale Modification of Music Signals".Applied Sciences.6 (2): 57.doi:10.3390/app6020057.
^Variable Speech, Creative Computing Vol. 9, No. 7 / July 1983 / p. 122
^"Listen to podcasts in half the time". Archived fromthe original on 2011-08-29. Retrieved2008-07-24.
^"Speeding iPods". Archived fromthe original on 2006-09-02.
^"HTMLMediaElement.playbackRate - Web APIs".MDN. Retrieved1 September 2021.

External links

[edit]

Time Stretching and Pitch Shifting Overview A comprehensive overview of current time and pitch modification techniques by Stephan Bernsee
Stephan Bernsee's smbPitchShift C source code C source code for doing frequency domain pitch manipulation
pitchshift.js from KievII A JavaScript pitchshifter based on smbPitchShift code, from the open sourceKievII library
The Phase Vocoder: A Tutorial - A good description of the phase vocoder
New Phase-Vocoder Techniques for Pitch-Shifting, Harmonizing and Other Exotic Effects
A new Approach to Transient Processing in the Phase Vocoder
PICOLA and TDHS
How to build a pitch shifter Theory, equations, figures and performances of a real-time guitar pitch shifter running on a DSP chip
ZTX Time Stretching Library Free and commercial versions of a popular 3rd party time stretching library for iOS, Linux, Windows and Mac OS X
Elastique by zplane commercial cross-platform library, mainly used by DJ and DAW manufacturers
Voice Synth from Qneo - specialized synthesizer for creative voice sculpting
TSM toolbox Free MATLAB implementations of various Time-Scale Modification procedures
PaulStretch at theWayback Machine (archived 2023-02-02), a well-known algorithm for extreme (>10×) time stretching
Bungee open source and commercial libraries for real time audio stretching
Rubber Band — open source library for time stretching and pitch shifting
SoundTouch — open-source library for changing the tempo, pitch and playback rate

v t e Music production
Engineering	Audio filter Audio mastering Audio mixing Critical distance Effects loop Effects unit Talk box Wah-wah pedal Diffusion Microphone Overdubbing Ping-ponging Punch in/out Sound recording Tape loop
Signal processing	Pitch shift Auto-Tune Chorus effect Compression Delay effect (STEED) Distortion Double tracking (ADT) Ducking Equalization Exciter effect Flanging Octave effect Noise gate Phaser Pumping Reverb Reverse echo
Practices aesthetics	Hip-hop production Lo-fi Overproduction Recording studio as an instrument Sampling Turntablism Wall of Sound Xenochrony
Roles professions	Arranger Audio engineer Backup band Bandleader DJ Ghostwriters in music Horn section Orchestrator Record producer Rhythm section Session musician Backup singer Ghost singer Vocal coach
Other	Click track Interpolation Loudness war Mashup Medley Music technology (electric) Music technology (electronic and digital) Remix
Record production portal