PDL::Audio - Some PDL functions intended for audio processing.
use PDL; use PDL::Audio;
Oh well ;) Not much "introductory documentation" has been written yet :(
Installing this distribution also installs pdlaudio-demo, which showcases some of the oeprators, and pdlaudio-birds
, which imites some bird calls with PDL::Audio. You should study them to get the hang of it.
Brackets around parameters indicate that the respective parameter is optional and will be replaced with some default value when absent (or undef
, which might be different in other packages).
The sampling frequency and duration are by default (see individual descriptions) given in cycles/sample (or samples in case of a duration). That means if you want to specify a duration of two seconds, you have to multiply by the sampling frequency in HZ, and if you want to specify a frequency of 440 Hz, you have to divide by the sampling frequency:
# Syntax: gen_oscil duration*, frequency/ $signal = gen_oscil 2*HZ, 440/HZ; # with a sampling frequency of 44100 Hertz: $signal = gen_oscil 2*44100, 440/44100; print describe_audio $signal, "\n"; playaudio $signal->scale2short;
To help you, the required unit is given as a type suffix in the parameter name. A "/" means that you have to divide by the sampling frequency (to convert from Hertz) and a suffix of "*" indicates that a multiplication is required.
Most parameters named "size", "duration" (or marked with "*") can be replaced by a piddle, which is then used to give length and from (mono/stereo).
The following header attributes are stored and evaluated by most functions. PDL::Audio provides mutator methods for all them (e.g.
print "samplerate is ", $pdl->rate; $pdl->comment("set the comment to this string");
The sampling rate in hz.
The filetype (wav, au etc..). Must be one of:
FILE_NEXT FILE_AIFC FILE_RIFF FILE_BICSF FILE_NIST FILE_INRS FILE_ESPS FILE_SVX FILE_VOC FILE_SNDT FILE_RAW FILE_SMP FILE_SD2 FILE_AVR FILE_IRCAM FILE_SD1 FILE_SPPACK FILE_MUS10 FILE_HCOM FILE_PSION FILE_MAUD FILE_IEEE FILE_DESKMATE FILE_DESKMATE_2500 FILE_MATLAB FILE_ADC FILE_SOUNDEDIT FILE_SOUNDEDIT_16 FILE_DVSM FILE_MIDI FILE_ESIGNAL FILE_SOUNDFONT FILE_GRAVIS FILE_COMDISCO FILE_GOLDWAVE FILE_SRFS FILE_MIDI_SAMPLE_DUMP FILE_DIAMONDWARE FILE_REALAUDIO FILE_ADF FILE_SBSTUDIOII FILE_DELUSION FILE_FARANDOLE FILE_SAMPLE_DUMP FILE_ULTRATRACKER FILE_YAMAHA_SY85 FILE_YAMAHA_TX16 FILE_DIGIPLAYER FILE_COVOX FILE_SPL FILE_AVI FILE_OMF FILE_QUICKTIME FILE_ASF FILE_YAMAHA_SY99 FILE_KURZWEIL_2000 FILE_AIFF FILE_AU
The filename (or file specification) used to load or save a file.
Specifies the type the underlying file format uses. The samples will always be in short or long signed format.
Must be one of
FORMAT_NO_SND FORMAT_16_LINEAR FORMAT_8_MULAW FORMAT_8_LINEAR FORMAT_32_FLOAT FORMAT_32_LINEAR FORMAT_8_ALAW FORMAT_8_UNSIGNED FORMAT_24_LINEAR FORMAT_64_DOUBLE FORMAT_16_LINEAR_LITTLE_ENDIAN FORMAT_32_LINEAR_LITTLE_ENDIAN FORMAT_32_FLOAT_LITTLE_ENDIAN FORMAT_64_DOUBLE_LITTLE_ENDIAN FORMAT_16_UNSIGNED FORMAT_16_UNSIGNED_LITTLE_ENDIAN FORMAT_24_LINEAR_LITTLE_ENDIAN FORMAT_32_VAX_FLOAT FORMAT_12_LINEAR FORMAT_12_LINEAR_LITTLE_ENDIAN FORMAT_12_UNSIGNED FORMAT_12_UNSIGNED_LITTLE_ENDIAN COMPATIBLE_FORMAT
PDL::Audio conviniently defines the following aliases for the following constants, that are already correct for the host byteorder:
FORMAT_ULAW_BYTE FORMAT_ALAW_BYTE FORMAT_LINEAR_BYTE FORMAT_LINEAR_SHORT FORMAT_LINEAR_USHORT FORMAT_LINEAR_LONG FORMAT_LINEAR_FLOAT FORMAT_LINEAR_DOUBLE
The file comment (if any).
The device to output audio. One of:
DEV_DEFAULT DEV_READ_WRITE DEV_ADAT_IN DEV_AES_IN DEV_LINE_OUT DEV_LINE_IN DEV_MICROPHONE DEV_SPEAKERS DEV_DIGITAL_IN DEV_DIGITAL_OUT DEV_DAC_OUT DEV_ADAT_OUT DEV_AES_OUT DEV_DAC_FILTER DEV_MIXER DEV_LINE1 DEV_LINE2 DEV_LINE3 DEV_AUX_INPUT DEV_CD_IN DEV_AUX_OUTPUT DEV_SPDIF_IN DEV_SPDIF_OUT
In addition to the exported constants described above (and later in the function descriptions), this module also exports the mathematical constants M_PI and M_2PI, so watch out for clashes!
Return the human-readable name of the file format with code format_code
.
Return the human-readable name of the sample type with code type_code
.
Describe the audio stream contained in piddle and return it as a string. A fresh piddle might return:
mono sound with 27411 samples
Whereas a freshly loaded soundfile might yield:
stereo sound with 27411 samples, original name "kongas.wav", type 2 (RIFF), rate 11025/s (duration 2.49s), format 7 (8-bit unsigned)
Reads audio data into the piddle. Options can be anything, most useful values are filetype
, rate
, channels
and format
. The returned piddle is represents "time" in the outer dimension, and samples in the inner (i.e. scalars for mono files, 2-vectors for stereo files):
[ [left0, right0], [left1, right1], [left2, right2], ...] # read any file $pdl = raudio "file.wav"; # read a file. if it is a raw file preset values $pdl = raudio "file.raw", filetype => FILE_RAW, rate => 44100, channels => 2;
Writes a pdl as a file. See raudio for options and format. The path and other metadata is taken from the header, whcih cna be overwritten using options, e.g.:
# write a file, using the header of another piddle $pdl->waudio ($orig_file->gethdr); # write pdl as .au file, take rate from the header $pdl->waudio (path => "piddle.au", filetype => FILE_AU, format => FORMAT_16_LINEAR;
Cuts the leading silence (i.e. all samples with absolute value < level) and returns the resulting part.
Cuts the trailing silence.
Calls cut_leading_silence
and cut_trailing_silence
and returns the result.
Play the piddle as an audio file. Options can be supplied either through the option hash (a hash-reference), through the pdl header or the options:
# play a piddle that has a valid header (e.g. from raudio) $pdl->playaudio; # play it with a different samplerate $pdl->playaudio(rate => 22050);
All of these functions generate appropriate waveforms with frequency freq
(cycles/sample) and phase phase
(0..1).
The duration
might be either a piddle (which gives the form of the output) or the number of samples to generate.
The output samples are between -1 and +1 (i.e. "-1 <= s <= +1").
The duty
parameter of the square generator influences the duty cycle of the signal. Zero means 50%-50%, 0.5 means 75% on, 25% off, -0.8 means 10% on, 90% off etc... Of course, the duty
parameter might also be a vector of size duration
.
Generates an interpolated envelope between the points given by xvals and yvals. When base == 1 (the default) then the values will be linearly interpolated, otherwise they follow an exponential curve that is bend inwards (base < 1) or outwards (base > 1).
# generate a linear envelope with attack in the first 10% gen_env 5000, [0 1 2 9 10], [0 1 0.6 0.6 0];
Simple ADSR envelope generator. The sustain-level
is the amplitude (0 to 1) of the sustain level. The other for parameters give the relative interval times, in any unit you like, only their relative ratios are important. Any of these times might be zero, in which case the corresponding part is omitted from the envelope.
gen_asymmetric_fm
provides a way around the symmetric spectra normally produced by FM. See Palamin and Palamin, "A Method of Generating and Controlling Asymmetrical Spectra" JAES vol 36, no 9, Sept 88, p671-685.
Generates a sum of n
cosines (1 + 2(cos(x) + cos(2x) + ... cos(nx)) = sin((n+.5)x) / sin(x/2))
. Other arguments are similar to to gen_oscil
.
gen_sine_summation
provides a kind of additive synthesis. See J.A.Moorer, "Signal Processing Aspects of Computer Music" and "The Synthesis of Complex Audio Spectra by means of Discrete Summation Formulae" (Stan-M-5). The basic idea is very similar to that used in gen_sum_of_cosines generator.
The default value for nsines
is 1 (but zero is a valid value), for a
is 0.5 and for b_ratio
is 1.
(btw, either my formula is broken or the output indeed does not lie between -1 and +1, but rather -5 .. +5).
gen_from_table
generates a waveform by repeating a waveform given in table
, linearly interpolating between successive points of the waveform
.
Take a (perl or pdl) list of (integer) partials
and a list of amplitudes
and generate a single wave shape that results by adding these partial sines.
This could (and should) be used by the gen_from_table
generator.
Take a (perl list or pdl) list of (possibly noninteger) partials
and a list of amplitudes
and generate the waveform resulting by summing up all these partial sines.
apply a two pole filter (given in polar form). The filter has two poles, one at (radius,frequency), the other at (radius,-frequency). Radius is between 0 and 1 (but less than 1), and frequency is between 0 and 0.5. This is the standard resonator form with poles specified by the polar coordinates of one pole.
apply a two zero filter (given in polar form). See filter_ppolar
.
partials2polynomial
takes a list of harmonic amplitudes and returns a list of Chebychev polynomial coefficients. The argument kind
determines which kind of Chebychev polynomial we are interested in, 1st kind or 2nd kind. (default is 1).
ring modulates in1 with in2 (this is just a multiply).
amplitude modulates am_carrier and in2 with in1 (this calculates in1 * (am_carrier + in2)).
Apply a fir (finite impulse response) filter to input
. This is the same as calling:
filter_sir input, xcoeffs, pdl()
Apply a iir (infinite impulse response) filter to input
. This is just another way of saying:
filter_sir input, pdl(1), ycoeffs
That is, the first member of ycoeffs
is being ignored AND SHOULD BE SPECIFIED AS ONE FOR FUTURE COMPATIBILITY!
Apply a comb filter to the piddle input
. This is implemented using a delay line of length delay
(which must be 1 or larger and can be non-integer) and a feedback scaler.
y(n) = x(n-size-1) + scaler * y(n-size)
cf. filter_notch
and http://www.harmony-central.com/Effects/Articles/Reverb/comb.html
Apply a comb filter to the piddle input
. This is implemented using a delay line of length delay
(which must be 1 or larger and can be non-integer) and a feedforward scaler.
y(n) = x(n-size-1) * scaler + y(n-size)
As a rule of thumb, the decay time of the feedback part is 7*delay/(1-scaler)
samples, so to get a decay of Dur seconds, scaler <= 1-7*delay/(Dur*Srate)
. The peak gain is 1/(1-(abs scaler))
. The peaks (or valleys in notch's case) are evenly spaced at srate/delay
. The height (or depth) thereof is determined by scaler -- the closer to 1.0, the more pronounced. See Julius Smith's "An Introduction to Digital Filter Theory" in Strawn "Digital Audio Signal Processing", or Smith's "Music Applications of Digital Waveguides"
filter_allpass
or "moving average comb" is just like filter_comb
but with an added feedforward term. If scaler-feedback == 0
, we get a moving average comb filter. If both scaler terms == 0, we get a pure delay line.
y(n) = feedforward*x(n-1) + x(n-size-1) + feedback*y(n-size)
cf. http://www.harmony-central.com/Effects/Articles/Reverb/allpass.html
Calculates the optimal (in the Chebyshev/minimax sense) FIR filter impulse response given a set of band edges, the desired reponse on those bands, and the weight given to the error in those bands, using the Parks-McClellan exchange algorithm.
The first argument sets the filter size: design_remez_fir
returns as many coefficients as specified by this parameter.
bands
is a vector of band edge pairs (start - end), which specify the start and end of the bands in the filter specification. These must be non-overlapping and sorted in increasing order. Only values between 0
(0 Hz) and 0.5
(the Nyquist frequency) are allowed.
des
specifies the desired gain in these bands.
weight
can be used to give each band a different weight. If absent, a vector of ones is used.
type
is any of the exported constants BANDPASS
, DIFFERENTIATOR
or HILBERT
and can be used to select various design types (use BANDPASS
until this is documented ;)
Generic sampling rate conversion, implemented by convoluting input
with a sinc function of size width
(default when unspecified or zero: 5).
srate
determines the input rate / output rate ratio, i.e. values > 1 speed up, values < 1 slow down. Values < 0 are allowed and reverse the signal.
If sr_mod
is omitted, the size of the output piddle is calculcated as length(input)/abs(srate)
, e.g. it provides the full stretched or shrinked input signal.
If sr_mod
is specified it must be as large as the desired output, i.e. it's size determines the output size. Each value in sr_mod
is added to srate
at the given point in "time", so it can be used to "modulate" the sampling rate change.
# create a sound effect in the style of "Forbidden Planet" $osc = 0.3 * gen_oscil $osc, 30 / $pdl->rate; $output = filter_src($input, 1, 0, $osc);
Contrast-enhancement phase-modulates a sound file. It's like audio MSG. The actual algorithm is (applied to the normalised sound) sin(input*pi/2 + (enhancement*sin(input*2*pi)))
. The result is to brighten the sound, helping it cut through a huge mix.
filter_granulate
"granulates" the sound file file. It is the poor man's way to change the speed at which things happen in a recorded sound without changing the pitches. It works by slicing the input file into short pieces, then overlapping these slices to lengthen (or shorten) the result; this process is sometimes known as granular synthesis, and is similar to the "freeze" function. The duration of each slice is length
-- the longer, the more like reverb the effect. The portion of the length (on a scale from 0 to 1.0) spent on each ramp (up or down) is ramp
. This can control the smoothness of the result of the overlaps. The more-or-less average time between successive segments is hop
. The accuracy at which we handle this hopping is set by the float jitter
-- if jitter
is very small, you may get an annoying tremolo. The overall amplitude scaler on each segment is scaler
-- this is used to try to to avoid overflows as we add all these zillions of segments together. expansion
determines the input hop in relation to the output hop; an expansion-amount of 2.0
should more or less double the length of the original, whereas an expansion-amount of 1.0
should return something close to the original speed.
The defaults for the arguments/options are:
expansion 1.0 length(*) 0.15 scaler 0.6 hop(*) 0.05 ramp 0.4 jitter(*) 0.5 maxsize infinity
The parameters/options marked with (*) actually depend on the sampling rate, and are always multiplied by the rate
attribute of the piddle internally. If the piddle lacks that attribute, 44100 is assumed. NOTE: This is different to most other filters, but should be ok since filter_granulate
only makes sense for audiofiles.
Generate a mix of all given piddles. The resulting piddle will contain the sum of all data-piddles at their respective positions, so some scaling will be necessary before or after the mixing operation (e.g. scale2short).
# mix the sound gong1 at position 0, the sound bass5 at position 22100 # and gong2 at position 44100. The resulting piddle will be large enough # to accomodate all the sounds: $mix = audiomix 0, $gong1, 44100, $gong2, 22100, $gong2
Normalize the piddle so that it is centered around y = 0
and has maximal amplitude of 1.
This method takes a sound in any format (preferably float or double) and scales it to fit into a signed short value, suitable for playback using playudio
or similar functions.
Creates and returns a specific fft window. The type
is any of the following. These are (case-insensitive) strings, so you might need to quote them.
RECTANGULAR just ones (the identity window) HANNING 0.50 - 0.50 * cos (0 .. 2pi) HAMMING 0.54 - 0.46 * cos (0 .. 2pi) WELCH 1 - (-1 .. 1) ** 2 PARZEN the triangle window BARTLETT the symmetric triangle window BLACKMAN2 blackman-harris window of order 2 BLACKMAN3 blackman-harris window of order 3 BLACKMAN4 blackman-harris window of order 4 EXPONENTIAL the exponential window KAISER the kaiser/bessel window (using the parameter C<beta>) CAUCHY the cauchy window (using the parameter <beta>) POISSON the poisson window (exponential using parameter C<beta>) RIEMANN the riemann window (sinc) GAUSSIAN the gaussian window of order C<beta>) TUKEY the tukey window (C<beta> specifies how much of the window consists of ones). COSCOS the cosine-squared window (a partition of unity) SINC same as RIEMANN HANN same as HANNING (his name was Hann, not Hanning) LIST this "type" is special in that it returns a list of all types
Do a (complex fft) of real
(extended to complex so that the imaginary part is zero), and return the complex fft result. This function tries to use PDL::FFTW (which is faster for large vectors) when available, and falls back to PDL::FFT, which is likely to return different phase signs (due to different kernel functions), so beware! In fact, since rfft
has to shuffle the data when using PDL::FFT, the fallback is always slower.
When using PDL::FFTW, a wisdom file ~/.pdl_wisdom is used and updated, if possible.
The inverse transformation (see rfft
). irfft rfft $pdl == $pdl
always holds.
Returns the spectrum of a given pdl. If norm
is absent (or undef
), it returns the magnitude of the fft of data
. When norm
== 1 (or eq 'NORM'
, case-insensitive), it returns the magnitude, normalized to be between zero and one. If norm
== 0 (or eq 'dB'
, case-insensitive), then it returns the magnitude in dB.
data
is multiplied with window
(if not undef
) before calculating the fft, and usually contains a window created with gen_fft_window
(using beta
). If window
is a string, it is handed over to gen_fft_window
(together with the beta parameter) to create a window of suitable size.
This function could be slightly faster.
This is not really an audio-related function. It simply takes all piddles and concats them into a larger one. At the moment it only supports single-dimensional piddles and is implemented quite slowly using perl and data-copying, but that might change...
Shift vector elements without wrap and fill the free space with a constant. Flows data back & forth, for values that overlap.
Positive values shift right, negative values shift left.
EOD Pars => 'x(n); int shift(); c(); [oca]y(n)', DefaultFlow => 1, Reversible => 1, PMCode => ' sub PDL::rshift { my @a = @_; if($#a == 3) { &PDL::_rshift_int(@a);@a=(); } elsif($#a == 1 || $#a == 2) { $a[3] = PDL->nullcreate($a[0]); &PDL::_rshift_int(@a); $a[3]; } else { barf "Invalid number of arguments for shiftin"; } } ', Code=>' int i,j; int n_size = $SIZE(n); for(i = -$shift(), j=0; j < n_size; i++, j++) $y(n=>j) = i >= 0 && i < n_size ? $x(n=>i) : $c(); ', BackCode=>' int i,j; int n_size = $SIZE(n); for(i = -$shift(), j=0; j < n_size; i++, j++) if (i >= 0 && i < n_size) $x(n=>i) = $y(n=>j); ' ;
pp_def 'polynomial', Pars => 'coeffs(n); x(m); [o]out(m)', Doc => 'evaluate the polynomial with coefficients coeffs
at the position(s) x
. coeffs[0]
is the constant term.', Code => q! loop(m) %{ $GENERIC() x = 1; $GENERIC() o = 0;
loop(n) %{ o += $coeffs() * x; x *= $x(); %} $out() = o; %} ! ;
pp_def 'linear_interpolate', Pars => 'x(); fx(n); fy(n); [o]y()', GenericTypes => [L,F,D], Doc => ' Look up the ordinate x
in the function given by fx
and fy
and return a linearly interpolated value (somewhat optimized for many lookups).
fx
specifies the ordinates (x-coordinates) of the function and most be sorted in increasing order. fy
are the y-coordinates of the function in these points. ', Code => q! int d = 0; int D = $SIZE(n) - 1; /* a tribute to PP's stupidity */ $GENERIC() xmin =$fx(n=>0); $GENERIC() xmax =$fx(n=>D);
threadloop %{ $GENERIC() x = $x(); $GENERIC() x1, x2, y1, y2; if (x <= xmin) $y() = $fy(n=>0); else if (x >= xmax) $y() = $fy(n=>D); else { while ($fx(n=>d) > x) d--; while ($fx(n=>d) <= x) d++; /* 0 < d <= D */ /* fx(d-1) < x <= fx(d) */ x2 = $fx(n=>d); y2 = $fy(n=>d); d--; x1 = $fx(n=>d); y1 = $fy(n=>d); $y() = y1 + (x-x1)*(y2-y1)/(x2-x1); } %} ! ;
pp_def 'bessi0', Pars => 'a(); [o]b()', Doc => 'calculate the (approximate) modified bessel function of the first kind', GenericTypes => [F,D], Code => q! $GENERIC() x = $a(); if (x > -3.75 && x < 3.75) { $GENERIC() y = (x / 3.75); y *= y; $b() = 1 + (y * (3.5156229 + (y * (3.0899424 + (y * (1.2067492 + (y * (0.2659732 + (y * (0.360768e-1 + (y * 0.45813e-2))))))))))); } else { $GENERIC() ax = x < 0 ? -x : x; $GENERIC() y = ax / 3.75; $b() = ((exp(ax) / sqrt(ax)) * (0.39894228 + (y * (0.1328592e-1 + (y * (0.225319e-2 + (y * (-0.157565e-2 + (y * (0.916281e-2 + (y * (-0.2057706e-1 + (y * (0.2635537e-1 + (y * (-0.1647633e-1 + (y * 0.392377e-2))))))))))))))))); } !, ;
pp_def 'fast_sin', Pars => 'r(n); [o]s(n)', GenericTypes => [F,D], Doc => 'fast sine function (inaccurate table lookup with ~12 bits precision)', Code => q^ # define SINE_SIZE 16384 static float *sine_table = 0;
if (!sine_table) { int i; Float phase, incr; sine_table = (float *) calloc (SINE_SIZE + 1, sizeof (float)); incr = M_2PI / (Float) SINE_SIZE; for (i = 0, phase = 0.0; i < SINE_SIZE + 1; i++, phase += incr) sine_table[i] = (float) sin (phase); } threadloop %{ loop(n) %{ $s() = sine_table[((int)($r() * (SINE_SIZE / M_2PI)) % SINE_SIZE + SINE_SIZE) % SINE_SIZE]; %} %}^ ;
pp_addpm {At => Bot}, <<'EOD';
Marc Lehmann <schmorp@schmorp.de>. The ideas were mostly taken from common lisp music (CLM), by Bill Schottstaedt bil@ccrma.stanford.edu
. I also borrowed many explanations (and references) from the clm docs and some code from clm.c. Highly inspiring!
perl(1), PDL.