Speech::Recognizer::SPX - Perl extension for the Sphinx2 speech recognizer
use Speech::Recognizer::SPX qw(:fbs :uttproc) fbs_init([arg1 => $val, arg2 => $val, ...]); uttproc_begin_utt(); uttproc_end_utt(); fbs_end();
This module provides a Perl interface to the Sphinx-II speech recognizer library.
Warning! This interface is subject to change. It's currently a bit clunky because of the way the Sphinx-II library is structured, and that will probably change (for the better, I hope) over time.
When the interface changes, future versions of this documentation will point out how it has changed and how to deal with this.
use Speech::Recognizer::SPX qw(:fbs :uttproc :lm);
Because most parts of the Sphinx-II library contain a lot of global internal state, it makes no sense to use an object-oriented interface at this time. However I don't want to clobber your namespace with a billion functions you may or may not use. To make things easier on your typing hands, the available functions have been grouped in to tags representing modules inside the library itself. These tags and the functions they import are listed below.
:fbs
This is somewhat of a misnomer - FBS stands for Fast Beam Search, but in actual fact this module (the fbs_main.c file in Sphinx-II) just wraps around the other modules in sphinx (one of which actually does fast beam search :-) and initializes the recognizer for you. Functions imported by this tag are:
fbs_init fbs_end
:uttproc
This is the utterance processing module. You feed it data (either raw audio data or feature data - which currently means vectors of mel-frequency cepstral coefficients), and it feeds back search hypotheses based on a language model. Functions imported by this tag are:
uttfile_open uttproc_begin_utt uttproc_rawdata uttproc_cepdata uttproc_end_utt uttproc_abort_utt uttproc_stop_utt uttproc_restart_utt uttproc_result uttproc_partial_result uttproc_get_uttid uttproc_set_auto_uttid_prefix uttproc_set_lm uttproc_lmupdate uttproc_set_context uttproc_set_rawlogdir uttproc_set_mfclogdir uttproc_set_logfile
:lm
This is the language model module. It loads and unloads language models.
lm_read lm_delete
fbs_init(\@args);
The fbs_init function is the main entry point to the Sphinx library. If given no arguments, it will snarf options from the global @ARGV array (because that's what its C equivalent does). To make life easier, and to entice people to write Sphinx programs in Perl instead of C, we also give you a way around this by allowing you to also pass a reference to an array whose contents are arranged in the same way @ARGV might be, i.e. a list of option/value pairs.
fbs_init
@ARGV
To make things pretty, you can use the magical => operator, like this:
fbs_init([samp => 16000, datadir => '/foo/bar/baz']);
Note that you can omit the leading dash from argument names (if you like).
Calling this function will block your process for a long time and print unbelievable amounts of debugging gunk to STDOUT and STDERR. This will get better eventually.
This function has a large number of options. Someday they will be documented. Until then, either look in the example code, or go straight to the source, namely the param variable in src/libsphinx2/fbs_main.c and the kb_param variable in src/libsphinx2/kb_main.c.
param
src/libsphinx2/fbs_main.c
kb_param
src/libsphinx2/kb_main.c
uttproc_begin_utt() or die; uttproc_rawdata($buf [, $block]) or die; uttproc_cepdata(\@cepvecs [, $block]) or die; uttproc_end_utt() or die;
To actually recognize some speech data, you use the functions exported by the :uttproc tag. Before calling any of them, you must successfully call uttproc_begin_utt, or Bad Things are certain to happen (I can't speculate on exactly what things, but I'm sure they're bad).
uttproc_begin_utt
You should call uttproc_begin_utt before each distinct utterance (to the extent that you can predict when individual utterances begin or end, of course...), and uttproc_end_utt at the end of each.
uttproc_end_utt
After calling uttproc_begin_utt, you can pass either raw audio data or cepstral feature vectors (see Audio::MFCC), using uttproc_rawdata or uttproc_cepdata, respectively. Due to the way feature extraction works, you cannot mix the two types of data within the same utterance.
uttproc_rawdata
uttproc_cepdata
If live mode is in effect (i.e. -livemode = TRUE> was passed to fbs_init), the optional $block parameter controls whether these functions will return immediately after processing a single frame of data, or whether they will process all pending frames of data. If you need partial results, you probably want to pass a non-zero value (FIXME: should be a true value but I don't know how to test for truth in XS code) for $block, though this may increase latency elsewhere in the system.
-livemode =
$block
Unfortunately, it appears that there is no specific function to flush all unprocessed frames before getting a partial result. Calling uttproc_rawdata with an empty $buf and $block non-zero seems to have the desired effect.
$buf
my ($frames, $hypothesis) = uttproc_result($block); my ($frames, $hypothesis) = uttproc_partial_result();
At any point during utterance processing, you may call uttproc_partial_result to obtain the current "best guess". Note that this function does not flush unprocessed frames, so you might want to use the trick mentioned above to do so before calling it if you are operating in non-blocking mode.
uttproc_partial_result
By contrast, you may not call uttproc_result until after you have called uttproc_end_utt (or uttproc_abort_utt or also possibly uttproc_stop_utt). The $block flag is also optional here, but I strongly suggest you use it.
uttproc_result
uttproc_abort_utt
uttproc_stop_utt
Changing language models, etc, etc... This documentation is under construction.
For now there are just some example programs in the distribution.
David Huggins-Daines <dhd@cepstral.com>
perl(1), Speech::Recognizer::SPX::Server, Audio::SPX, Audio::MFCC
To install Speech::Recognizer::SPX, copy and paste the appropriate command in to your terminal.
cpanm
cpanm Speech::Recognizer::SPX
CPAN shell
perl -MCPAN -e shell install Speech::Recognizer::SPX
For more information on module installation, please visit the detailed CPAN module installation guide.