Configuration parameters
These are the parameters currently recognized by
pocketsphinx.Config
and pocketsphinx.Decoder
along with their
default values.
- Config(*args, **kwargs)
Create a PocketSphinx configuration from keyword arguments described below. For example:
config = Config(hmm="path/to/things", dict="my.dict")
The same keyword arguments can also be passed directly to the constructor for
pocketsphinx.Decoder
.Many parameters have default values. Also, when constructing a
Config
directly (as opposed to parsing JSON),hmm
,lm
, anddict
are set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passingNone
for any of these parameters, e.g.:config = Config(lm=None) # Do not load a language model
Decoder initialization will fail if more than one of
lm
,jsgf
,fsg
,keyphrase
,kws
,allphone
, orlmctl
are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize aDecoder
orConfig
with any of these (and notlm
), the defaultlm
value will be removed. This is not the case if you decide to set one of them in an existingConfig
, so in that case you must make sure to setlm
toNone
:config["jsgf"] = "spam_eggs_and_spam.gram" config["lm"] = None
- Keyword Arguments
hmm (str) – Directory containing acoustic model files.
logspec (bool) – Write out logspectral files instead of cepstra, defaults to
False
smoothspec (bool) – Write out cepstral-smoothed logspectral files, defaults to
False
transform (str) – Which type of transform to use to calculate cepstra (legacy, dct, or htk), defaults to
legacy
alpha (float) – Preemphasis parameter, defaults to
0.97
samprate (int) – Sampling rate, defaults to
16000
frate (int) – Frame rate, defaults to
100
wlen (float) – Hamming window length, defaults to
0.025625
nfft (int) – Size of FFT, or 0 to set automatically (recommended), defaults to
0
nfilt (int) – Number of filter banks, defaults to
40
lowerf (float) – Lower edge of filters, defaults to
133.33334
upperf (float) – Upper edge of filters, defaults to
6855.4976
unit_area (bool) – Normalize mel filters to unit area, defaults to
True
round_filters (bool) – Round mel filter frequencies to DFT points, defaults to
True
ncep (int) – Number of cep coefficients, defaults to
13
doublebw (bool) – Use double bandwidth filters (same center freq), defaults to
False
lifter (int) – Length of sin-curve for liftering, or 0 for no liftering., defaults to
0
input_endian (str) – Endianness of input data, big or little, ignored if NIST or MS Wav, defaults to
little
warp_type (str) – Warping function type (or shape), defaults to
inverse_linear
warp_params (str) – Parameters defining the warping function
dither (bool) – Add 1/2-bit noise, defaults to
False
seed (int) – Seed for random number generator; if less than zero, pick our own, defaults to
-1
remove_dc (bool) – Remove DC offset from each frame, defaults to
False
remove_noise (bool) – Remove noise using spectral subtraction, defaults to
False
verbose (bool) – Show input filenames, defaults to
False
feat (str) – Feature stream type, depends on the acoustic model, defaults to
1s_c_d_dd
ceplen (int) – Number of components in the input feature vector, defaults to
13
cmn (str) – Cepstral mean normalization scheme (‘live’, ‘batch’, or ‘none’), defaults to
live
cmninit (str) – Initial values (comma-separated) for cepstral mean when ‘live’ is used, defaults to
40,3,-1
varnorm (bool) – Variance normalize each utterance (only if CMN == current), defaults to
False
agc (str) – Automatic gain control for c0 (‘max’, ‘emax’, ‘noise’, or ‘none’), defaults to
none
agcthresh (float) – Initial threshold for automatic gain control, defaults to
2.0
lda (str) – File containing transformation matrix to be applied to features (single-stream features only)
ldadim (int) – Dimensionality of output of feature transformation (0 to use entire matrix), defaults to
0
svspec (str) – Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
featparams (str) – File containing feature extraction parameters.
mdef (str) – Model definition input file
senmgau (str) – Senone to codebook mapping input file (usually not needed)
tmat (str) – HMM state transition matrix input file
tmatfloor (float) – HMM state transition probability floor (applied to -tmat file), defaults to
0.0001
mean (str) – Mixture gaussian means input file
var (str) – Mixture gaussian variances input file
varfloor (float) – Mixture gaussian variance floor (applied to data from -var file), defaults to
0.0001
mixw (str) – Senone mixture weights input file (uncompressed)
mixwfloor (float) – Senone mixture weights floor (applied to data from -mixw file), defaults to
1e-07
aw (int) – Inverse weight applied to acoustic scores., defaults to
1
sendump (str) – Senone dump (compressed mixture weights) input file
mllr (str) – MLLR transformation to apply to means and variances
mmap (bool) – Use memory-mapped I/O (if possible) for model files, defaults to
True
ds (int) – Frame GMM computation downsampling ratio, defaults to
1
topn (int) – Maximum number of top Gaussians to use in scoring., defaults to
4
topn_beam (str) – Beam width used to determine top-N Gaussians (or a list, per-feature), defaults to
0
logbase (float) – Base in which all log-likelihoods calculated, defaults to
1.0001
beam (float) – Beam width applied to every frame in Viterbi search (smaller values mean wider beam), defaults to
1e-48
wbeam (float) – Beam width applied to word exits, defaults to
7e-29
pbeam (float) – Beam width applied to phone transitions, defaults to
1e-48
lpbeam (float) – Beam width applied to last phone in words, defaults to
1e-40
lponlybeam (float) – Beam width applied to last phone in single-phone words, defaults to
7e-29
fwdflatbeam (float) – Beam width applied to every frame in second-pass flat search, defaults to
1e-64
fwdflatwbeam (float) – Beam width applied to word exits in second-pass flat search, defaults to
7e-29
pl_window (int) – Phoneme lookahead window size, in frames, defaults to
5
pl_beam (float) – Beam width applied to phone loop search for lookahead, defaults to
1e-10
pl_pbeam (float) – Beam width applied to phone loop transitions for lookahead, defaults to
1e-10
pl_pip (float) – Phone insertion penalty for phone loop, defaults to
1.0
pl_weight (float) – Weight for phoneme lookahead penalties, defaults to
3.0
compallsen (bool) – Compute all senone scores in every frame (can be faster when there are many senones), defaults to
False
fwdtree (bool) – Run forward lexicon-tree search (1st pass), defaults to
True
fwdflat (bool) – Run forward flat-lexicon search over word lattice (2nd pass), defaults to
True
bestpath (bool) – Run bestpath (Dijkstra) search over word lattice (3rd pass), defaults to
True
backtrace (bool) – Print results and backtraces to log., defaults to
False
latsize (int) – Initial backpointer table size, defaults to
5000
maxwpf (int) – Maximum number of distinct word exits at each frame (or -1 for no pruning), defaults to
-1
maxhmmpf (int) – Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to
30000
min_endfr (int) – Nodes ignored in lattice construction if they persist for fewer than N frames, defaults to
0
fwdflatefwid (int) – Minimum number of end frames for a word to be searched in fwdflat search, defaults to
4
fwdflatsfwin (int) – Window of frames in lattice to search for successor words in fwdflat search , defaults to
25
dict (str) – Main pronunciation dictionary (lexicon) input file
fdict (str) – Noise word pronunciation dictionary input file
dictcase (bool) – Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only), defaults to
False
allphone (str) – Perform phoneme decoding with phonetic lm (given here)
allphone_ci (bool) – Perform phoneme decoding with phonetic lm and context-independent units only, defaults to
True
lm (str) – Word trigram language model input file
lmctl (str) – Specify a set of language model
lmname (str) – Which language model in -lmctl to use by default
lw (float) – Language model probability weight, defaults to
6.5
fwdflatlw (float) – Language model probability weight for flat lexicon (2nd pass) decoding, defaults to
8.5
bestpathlw (float) – Language model probability weight for bestpath search, defaults to
9.5
ascale (float) – Inverse of acoustic model scale for confidence score calculation, defaults to
20.0
wip (float) – Word insertion penalty, defaults to
0.65
nwpen (float) – New word transition penalty, defaults to
1.0
pip (float) – Phone insertion penalty, defaults to
1.0
uw (float) – Unigram weight, defaults to
1.0
silprob (float) – Silence word transition probability, defaults to
0.005
fillprob (float) – Filler word transition probability, defaults to
1e-08
fsg (str) – Sphinx format finite state grammar file
jsgf (str) – JSGF grammar file
toprule (str) – Start rule for JSGF (first public rule is default)
fsgusealtpron (bool) – Add alternate pronunciations to FSG, defaults to
True
fsgusefiller (bool) – Insert filler words at each state., defaults to
True
keyphrase (str) – Keyphrase to spot
kws (str) – A file with keyphrases to spot, one per line
kws_plp (float) – Phone loop probability for keyphrase spotting, defaults to
0.1
kws_delay (int) – Delay to wait for best detection score, defaults to
10
kws_threshold (float) – Threshold for p(hyp)/p(alternatives) ratio, defaults to
1e-30
logfn (str) – File to write log messages in
loglevel (str) – Minimum level of log messages (DEBUG, INFO, WARN, ERROR), defaults to
WARN
mfclogdir (str) – Directory to log feature files to
rawlogdir (str) – Directory to log raw audio files to
senlogdir (str) – Directory to log senone score files to