Configuration parameters
These are the parameters currently recognized by
pocketsphinx.Config and pocketsphinx.Decoder along with their
default values.
- Config(*args, **kwargs)
Create a PocketSphinx configuration from keyword arguments described below. For example:
config = Config(hmm="path/to/things", dict="my.dict")
The same keyword arguments can also be passed directly to the constructor for
pocketsphinx.Decoder.Many parameters have default values. Also, when constructing a
Configdirectly (as opposed to parsing JSON),hmm,lm, anddictare set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passingNonefor any of these parameters, e.g.:config = Config(lm=None) # Do not load a language model
Decoder initialization will fail if more than one of
lm,jsgf,fsg,keyphrase,kws,allphone, orlmctlare set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize aDecoderorConfigwith any of these (and notlm), the defaultlmvalue will be removed. This is not the case if you decide to set one of them in an existingConfig, so in that case you must make sure to setlmtoNone:config["jsgf"] = "spam_eggs_and_spam.gram" config["lm"] = None
- Keyword Arguments:
hmm (str) – Directory containing acoustic model files.
logspec (bool) – Write out logspectral files instead of cepstra, defaults to
Falsesmoothspec (bool) – Write out cepstral-smoothed logspectral files, defaults to
Falsetransform (str) – Which type of transform to use to calculate cepstra (legacy, dct, or htk), defaults to
legacyalpha (float) – Preemphasis parameter, defaults to
0.97samprate (int) – Sampling rate, defaults to
16000frate (int) – Frame rate, defaults to
100wlen (float) – Hamming window length, defaults to
0.025625nfft (int) – Size of FFT, or 0 to set automatically (recommended), defaults to
0nfilt (int) – Number of filter banks, defaults to
40lowerf (float) – Lower edge of filters, defaults to
133.33334upperf (float) – Upper edge of filters, defaults to
6855.4976unit_area (bool) – Normalize mel filters to unit area, defaults to
Trueround_filters (bool) – Round mel filter frequencies to DFT points, defaults to
Truencep (int) – Number of cep coefficients, defaults to
13doublebw (bool) – Use double bandwidth filters (same center freq), defaults to
Falselifter (int) – Length of sin-curve for liftering, or 0 for no liftering., defaults to
0input_endian (str) – Endianness of input data, big or little, ignored if NIST or MS Wav, defaults to
littlewarp_type (str) – Warping function type (or shape), defaults to
inverse_linearwarp_params (str) – Parameters defining the warping function
dither (bool) – Add 1/2-bit noise, defaults to
Falseseed (int) – Seed for random number generator; if less than zero, pick our own, defaults to
-1remove_dc (bool) – Remove DC offset from each frame, defaults to
Falseremove_noise (bool) – Remove noise using spectral subtraction, defaults to
Falseverbose (bool) – Show input filenames, defaults to
Falsefeat (str) – Feature stream type, depends on the acoustic model, defaults to
1s_c_d_ddceplen (int) – Number of components in the input feature vector, defaults to
13cmn (str) – Cepstral mean normalization scheme (‘live’, ‘batch’, or ‘none’), defaults to
livecmninit (str) – Initial values (comma-separated) for cepstral mean when ‘live’ is used, defaults to
40,3,-1varnorm (bool) – Variance normalize each utterance (only if CMN == current), defaults to
Falseagc (str) – Automatic gain control for c0 (‘max’, ‘emax’, ‘noise’, or ‘none’), defaults to
noneagcthresh (float) – Initial threshold for automatic gain control, defaults to
2.0lda (str) – File containing transformation matrix to be applied to features (single-stream features only)
ldadim (int) – Dimensionality of output of feature transformation (0 to use entire matrix), defaults to
0svspec (str) – Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
featparams (str) – File containing feature extraction parameters.
mdef (str) – Model definition input file
senmgau (str) – Senone to codebook mapping input file (usually not needed)
tmat (str) – HMM state transition matrix input file
tmatfloor (float) – HMM state transition probability floor (applied to -tmat file), defaults to
0.0001mean (str) – Mixture gaussian means input file
var (str) – Mixture gaussian variances input file
varfloor (float) – Mixture gaussian variance floor (applied to data from -var file), defaults to
0.0001mixw (str) – Senone mixture weights input file (uncompressed)
mixwfloor (float) – Senone mixture weights floor (applied to data from -mixw file), defaults to
1e-07aw (int) – Inverse weight applied to acoustic scores., defaults to
1sendump (str) – Senone dump (compressed mixture weights) input file
mllr (str) – MLLR transformation to apply to means and variances
mmap (bool) – Use memory-mapped I/O (if possible) for model files, defaults to
Trueds (int) – Frame GMM computation downsampling ratio, defaults to
1topn (int) – Maximum number of top Gaussians to use in scoring., defaults to
4topn_beam (str) – Beam width used to determine top-N Gaussians (or a list, per-feature), defaults to
0logbase (float) – Base in which all log-likelihoods calculated, defaults to
1.0001beam (float) – Beam width applied to every frame in Viterbi search (smaller values mean wider beam), defaults to
1e-48wbeam (float) – Beam width applied to word exits, defaults to
7e-29pbeam (float) – Beam width applied to phone transitions, defaults to
1e-48lpbeam (float) – Beam width applied to last phone in words, defaults to
1e-40lponlybeam (float) – Beam width applied to last phone in single-phone words, defaults to
7e-29fwdflatbeam (float) – Beam width applied to every frame in second-pass flat search, defaults to
1e-64fwdflatwbeam (float) – Beam width applied to word exits in second-pass flat search, defaults to
7e-29pl_window (int) – Phoneme lookahead window size, in frames, defaults to
5pl_beam (float) – Beam width applied to phone loop search for lookahead, defaults to
1e-10pl_pbeam (float) – Beam width applied to phone loop transitions for lookahead, defaults to
1e-10pl_pip (float) – Phone insertion penalty for phone loop, defaults to
1.0pl_weight (float) – Weight for phoneme lookahead penalties, defaults to
3.0compallsen (bool) – Compute all senone scores in every frame (can be faster when there are many senones), defaults to
Falsefwdtree (bool) – Run forward lexicon-tree search (1st pass), defaults to
Truefwdflat (bool) – Run forward flat-lexicon search over word lattice (2nd pass), defaults to
Truebestpath (bool) – Run bestpath (Dijkstra) search over word lattice (3rd pass), defaults to
Truebacktrace (bool) – Print results and backtraces to log., defaults to
Falselatsize (int) – Initial backpointer table size, defaults to
5000maxwpf (int) – Maximum number of distinct word exits at each frame (or -1 for no pruning), defaults to
-1maxhmmpf (int) – Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to
30000min_endfr (int) – Nodes ignored in lattice construction if they persist for fewer than N frames, defaults to
0fwdflatefwid (int) – Minimum number of end frames for a word to be searched in fwdflat search, defaults to
4fwdflatsfwin (int) – Window of frames in lattice to search for successor words in fwdflat search , defaults to
25dict (str) – Main pronunciation dictionary (lexicon) input file
fdict (str) – Noise word pronunciation dictionary input file
dictcase (bool) – Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only), defaults to
Falseallphone (str) – Perform phoneme decoding with phonetic lm (given here)
allphone_ci (bool) – Perform phoneme decoding with phonetic lm and context-independent units only, defaults to
Truelm (str) – Word trigram language model input file
lmctl (str) – Specify a set of language model
lmname (str) – Which language model in -lmctl to use by default
lw (float) – Language model probability weight, defaults to
6.5fwdflatlw (float) – Language model probability weight for flat lexicon (2nd pass) decoding, defaults to
8.5bestpathlw (float) – Language model probability weight for bestpath search, defaults to
9.5ascale (float) – Inverse of acoustic model scale for confidence score calculation, defaults to
20.0wip (float) – Word insertion penalty, defaults to
0.65nwpen (float) – New word transition penalty, defaults to
1.0pip (float) – Phone insertion penalty, defaults to
1.0uw (float) – Unigram weight, defaults to
1.0silprob (float) – Silence word transition probability, defaults to
0.005fillprob (float) – Filler word transition probability, defaults to
1e-08fsg (str) – Sphinx format finite state grammar file
jsgf (str) – JSGF grammar file
toprule (str) – Start rule for JSGF (first public rule is default)
fsgusealtpron (bool) – Add alternate pronunciations to FSG, defaults to
Truefsgusefiller (bool) – Insert filler words at each state., defaults to
Truekeyphrase (str) – Keyphrase to spot
kws (str) – A file with keyphrases to spot, one per line
kws_plp (float) – Phone loop probability for keyphrase spotting, defaults to
0.1kws_delay (int) – Delay to wait for best detection score, defaults to
10kws_threshold (float) – Threshold for p(hyp)/p(alternatives) ratio, defaults to
1e-30logfn (str) – File to write log messages in
loglevel (str) – Minimum level of log messages (DEBUG, INFO, WARN, ERROR), defaults to
WARNmfclogdir (str) – Directory to log feature files to
rawlogdir (str) – Directory to log raw audio files to
senlogdir (str) – Directory to log senone score files to