Configuration parameters

These are the parameters currently recognized by pocketsphinx.Config and pocketsphinx.Decoder along with their default values.

Config(*args, **kwargs)

Create a PocketSphinx configuration from keyword arguments described below. For example:

config = Config(hmm="path/to/things", dict="my.dict")

The same keyword arguments can also be passed directly to the constructor for pocketsphinx.Decoder.

Many parameters have default values. Also, when constructing a Config directly (as opposed to parsing JSON), hmm, lm, and dict are set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passing None for any of these parameters, e.g.:

config = Config(lm=None)  # Do not load a language model

Decoder initialization will fail if more than one of lm, jsgf, fsg, keyphrase, kws, allphone, or lmctl are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize a Decoder or Config with any of these (and not lm), the default lm value will be removed. This is not the case if you decide to set one of them in an existing Config, so in that case you must make sure to set lm to None:

config["jsgf"] = "spam_eggs_and_spam.gram"
config["lm"] = None

Keyword Arguments:

hmm (str) – Directory containing acoustic model files.
logspec (bool) – Write out logspectral files instead of cepstra, defaults to False
smoothspec (bool) – Write out cepstral-smoothed logspectral files, defaults to False
transform (str) – Which type of transform to use to calculate cepstra (legacy, dct, or htk), defaults to legacy
alpha (float) – Preemphasis parameter, defaults to 0.97
samprate (int) – Sampling rate, defaults to 16000
frate (int) – Frame rate, defaults to 100
wlen (float) – Hamming window length, defaults to 0.025625
nfft (int) – Size of FFT, or 0 to set automatically (recommended), defaults to 0
nfilt (int) – Number of filter banks, defaults to 40
lowerf (float) – Lower edge of filters, defaults to 133.33334
upperf (float) – Upper edge of filters, defaults to 6855.4976
unit_area (bool) – Normalize mel filters to unit area, defaults to True
round_filters (bool) – Round mel filter frequencies to DFT points, defaults to True
ncep (int) – Number of cep coefficients, defaults to 13
doublebw (bool) – Use double bandwidth filters (same center freq), defaults to False
lifter (int) – Length of sin-curve for liftering, or 0 for no liftering., defaults to 0
input_endian (str) – Endianness of input data, big or little, ignored if NIST or MS Wav, defaults to little
warp_type (str) – Warping function type (or shape), defaults to inverse_linear
warp_params (str) – Parameters defining the warping function
dither (bool) – Add 1/2-bit noise, defaults to False
seed (int) – Seed for random number generator; if less than zero, pick our own, defaults to -1
remove_dc (bool) – Remove DC offset from each frame, defaults to False
remove_noise (bool) – Remove noise using spectral subtraction, defaults to False
verbose (bool) – Show input filenames, defaults to False
feat (str) – Feature stream type, depends on the acoustic model, defaults to 1s_c_d_dd
ceplen (int) – Number of components in the input feature vector, defaults to 13
cmn (str) – Cepstral mean normalization scheme (‘live’, ‘batch’, or ‘none’), defaults to live
cmninit (str) – Initial values (comma-separated) for cepstral mean when ‘live’ is used, defaults to 40,3,-1
varnorm (bool) – Variance normalize each utterance (only if CMN == current), defaults to False
agc (str) – Automatic gain control for c0 (‘max’, ‘emax’, ‘noise’, or ‘none’), defaults to none
agcthresh (float) – Initial threshold for automatic gain control, defaults to 2.0
lda (str) – File containing transformation matrix to be applied to features (single-stream features only)
ldadim (int) – Dimensionality of output of feature transformation (0 to use entire matrix), defaults to 0
svspec (str) – Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
featparams (str) – File containing feature extraction parameters.
mdef (str) – Model definition input file
senmgau (str) – Senone to codebook mapping input file (usually not needed)
tmat (str) – HMM state transition matrix input file
tmatfloor (float) – HMM state transition probability floor (applied to -tmat file), defaults to 0.0001
mean (str) – Mixture gaussian means input file
var (str) – Mixture gaussian variances input file
varfloor (float) – Mixture gaussian variance floor (applied to data from -var file), defaults to 0.0001
mixw (str) – Senone mixture weights input file (uncompressed)
mixwfloor (float) – Senone mixture weights floor (applied to data from -mixw file), defaults to 1e-07
aw (int) – Inverse weight applied to acoustic scores., defaults to 1
sendump (str) – Senone dump (compressed mixture weights) input file
mllr (str) – MLLR transformation to apply to means and variances
mmap (bool) – Use memory-mapped I/O (if possible) for model files, defaults to True
ds (int) – Frame GMM computation downsampling ratio, defaults to 1
topn (int) – Maximum number of top Gaussians to use in scoring., defaults to 4
topn_beam (str) – Beam width used to determine top-N Gaussians (or a list, per-feature), defaults to 0
logbase (float) – Base in which all log-likelihoods calculated, defaults to 1.0001
beam (float) – Beam width applied to every frame in Viterbi search (smaller values mean wider beam), defaults to 1e-48
wbeam (float) – Beam width applied to word exits, defaults to 7e-29
pbeam (float) – Beam width applied to phone transitions, defaults to 1e-48
lpbeam (float) – Beam width applied to last phone in words, defaults to 1e-40
lponlybeam (float) – Beam width applied to last phone in single-phone words, defaults to 7e-29
fwdflatbeam (float) – Beam width applied to every frame in second-pass flat search, defaults to 1e-64
fwdflatwbeam (float) – Beam width applied to word exits in second-pass flat search, defaults to 7e-29
pl_window (int) – Phoneme lookahead window size, in frames, defaults to 5
pl_beam (float) – Beam width applied to phone loop search for lookahead, defaults to 1e-10
pl_pbeam (float) – Beam width applied to phone loop transitions for lookahead, defaults to 1e-10
pl_pip (float) – Phone insertion penalty for phone loop, defaults to 1.0
pl_weight (float) – Weight for phoneme lookahead penalties, defaults to 3.0
compallsen (bool) – Compute all senone scores in every frame (can be faster when there are many senones), defaults to False
fwdtree (bool) – Run forward lexicon-tree search (1st pass), defaults to True
fwdflat (bool) – Run forward flat-lexicon search over word lattice (2nd pass), defaults to True
bestpath (bool) – Run bestpath (Dijkstra) search over word lattice (3rd pass), defaults to True
backtrace (bool) – Print results and backtraces to log., defaults to False
latsize (int) – Initial backpointer table size, defaults to 5000
maxwpf (int) – Maximum number of distinct word exits at each frame (or -1 for no pruning), defaults to -1
maxhmmpf (int) – Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to 30000
min_endfr (int) – Nodes ignored in lattice construction if they persist for fewer than N frames, defaults to 0
fwdflatefwid (int) – Minimum number of end frames for a word to be searched in fwdflat search, defaults to 4
fwdflatsfwin (int) – Window of frames in lattice to search for successor words in fwdflat search , defaults to 25
dict (str) – Main pronunciation dictionary (lexicon) input file
fdict (str) – Noise word pronunciation dictionary input file
dictcase (bool) – Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only), defaults to False
allphone (str) – Perform phoneme decoding with phonetic lm (given here)
allphone_ci (bool) – Perform phoneme decoding with phonetic lm and context-independent units only, defaults to True
lm (str) – Word trigram language model input file
lmctl (str) – Specify a set of language model
lmname (str) – Which language model in -lmctl to use by default
lw (float) – Language model probability weight, defaults to 6.5
fwdflatlw (float) – Language model probability weight for flat lexicon (2nd pass) decoding, defaults to 8.5
bestpathlw (float) – Language model probability weight for bestpath search, defaults to 9.5
ascale (float) – Inverse of acoustic model scale for confidence score calculation, defaults to 20.0
wip (float) – Word insertion penalty, defaults to 0.65
nwpen (float) – New word transition penalty, defaults to 1.0
pip (float) – Phone insertion penalty, defaults to 1.0
uw (float) – Unigram weight, defaults to 1.0
silprob (float) – Silence word transition probability, defaults to 0.005
fillprob (float) – Filler word transition probability, defaults to 1e-08
fsg (str) – Sphinx format finite state grammar file
jsgf (str) – JSGF grammar file
toprule (str) – Start rule for JSGF (first public rule is default)
fsgusealtpron (bool) – Add alternate pronunciations to FSG, defaults to True
fsgusefiller (bool) – Insert filler words at each state., defaults to True
keyphrase (str) – Keyphrase to spot
kws (str) – A file with keyphrases to spot, one per line
kws_plp (float) – Phone loop probability for keyphrase spotting, defaults to 0.1
kws_delay (int) – Delay to wait for best detection score, defaults to 10
kws_threshold (float) – Threshold for p(hyp)/p(alternatives) ratio, defaults to 1e-30
logfn (str) – File to write log messages in
loglevel (str) – Minimum level of log messages (DEBUG, INFO, WARN, ERROR), defaults to WARN
mfclogdir (str) – Directory to log feature files to
rawlogdir (str) – Directory to log raw audio files to
senlogdir (str) – Directory to log senone score files to