Configuration parameters

These are the parameters currently recognized by pocketsphinx.Config and pocketsphinx.Decoder along with their default values.

Config(*args, **kwargs)

Create a PocketSphinx configuration from keyword arguments described below. For example:

config = Config(hmm="path/to/things", dict="my.dict")

The same keyword arguments can also be passed directly to the constructor for pocketsphinx.Decoder.

Many parameters have default values. Also, when constructing a Config directly (as opposed to parsing JSON), hmm, lm, and dict are set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passing None for any of these parameters, e.g.:

config = Config(lm=None)  # Do not load a language model

Decoder initialization will fail if more than one of lm, jsgf, fsg, keyphrase, kws, allphone, or lmctl are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize a Decoder or Config with any of these (and not lm), the default lm value will be removed. This is not the case if you decide to set one of them in an existing Config, so in that case you must make sure to set lm to None:

config["jsgf"] = "spam_eggs_and_spam.gram"
config["lm"] = None
Keyword Arguments
  • hmm (str) – Directory containing acoustic model files.

  • logspec (bool) – Write out logspectral files instead of cepstra, defaults to False

  • smoothspec (bool) – Write out cepstral-smoothed logspectral files, defaults to False

  • transform (str) – Which type of transform to use to calculate cepstra (legacy, dct, or htk), defaults to legacy

  • alpha (float) – Preemphasis parameter, defaults to 0.97

  • samprate (int) – Sampling rate, defaults to 16000

  • frate (int) – Frame rate, defaults to 100

  • wlen (float) – Hamming window length, defaults to 0.025625

  • nfft (int) – Size of FFT, or 0 to set automatically (recommended), defaults to 0

  • nfilt (int) – Number of filter banks, defaults to 40

  • lowerf (float) – Lower edge of filters, defaults to 133.33334

  • upperf (float) – Upper edge of filters, defaults to 6855.4976

  • unit_area (bool) – Normalize mel filters to unit area, defaults to True

  • round_filters (bool) – Round mel filter frequencies to DFT points, defaults to True

  • ncep (int) – Number of cep coefficients, defaults to 13

  • doublebw (bool) – Use double bandwidth filters (same center freq), defaults to False

  • lifter (int) – Length of sin-curve for liftering, or 0 for no liftering., defaults to 0

  • input_endian (str) – Endianness of input data, big or little, ignored if NIST or MS Wav, defaults to little

  • warp_type (str) – Warping function type (or shape), defaults to inverse_linear

  • warp_params (str) – Parameters defining the warping function

  • dither (bool) – Add 1/2-bit noise, defaults to False

  • seed (int) – Seed for random number generator; if less than zero, pick our own, defaults to -1

  • remove_dc (bool) – Remove DC offset from each frame, defaults to False

  • remove_noise (bool) – Remove noise using spectral subtraction, defaults to False

  • verbose (bool) – Show input filenames, defaults to False

  • feat (str) – Feature stream type, depends on the acoustic model, defaults to 1s_c_d_dd

  • ceplen (int) – Number of components in the input feature vector, defaults to 13

  • cmn (str) – Cepstral mean normalization scheme (‘live’, ‘batch’, or ‘none’), defaults to live

  • cmninit (str) – Initial values (comma-separated) for cepstral mean when ‘live’ is used, defaults to 40,3,-1

  • varnorm (bool) – Variance normalize each utterance (only if CMN == current), defaults to False

  • agc (str) – Automatic gain control for c0 (‘max’, ‘emax’, ‘noise’, or ‘none’), defaults to none

  • agcthresh (float) – Initial threshold for automatic gain control, defaults to 2.0

  • lda (str) – File containing transformation matrix to be applied to features (single-stream features only)

  • ldadim (int) – Dimensionality of output of feature transformation (0 to use entire matrix), defaults to 0

  • svspec (str) – Subvector specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)

  • featparams (str) – File containing feature extraction parameters.

  • mdef (str) – Model definition input file

  • senmgau (str) – Senone to codebook mapping input file (usually not needed)

  • tmat (str) – HMM state transition matrix input file

  • tmatfloor (float) – HMM state transition probability floor (applied to -tmat file), defaults to 0.0001

  • mean (str) – Mixture gaussian means input file

  • var (str) – Mixture gaussian variances input file

  • varfloor (float) – Mixture gaussian variance floor (applied to data from -var file), defaults to 0.0001

  • mixw (str) – Senone mixture weights input file (uncompressed)

  • mixwfloor (float) – Senone mixture weights floor (applied to data from -mixw file), defaults to 1e-07

  • aw (int) – Inverse weight applied to acoustic scores., defaults to 1

  • sendump (str) – Senone dump (compressed mixture weights) input file

  • mllr (str) – MLLR transformation to apply to means and variances

  • mmap (bool) – Use memory-mapped I/O (if possible) for model files, defaults to True

  • ds (int) – Frame GMM computation downsampling ratio, defaults to 1

  • topn (int) – Maximum number of top Gaussians to use in scoring., defaults to 4

  • topn_beam (str) – Beam width used to determine top-N Gaussians (or a list, per-feature), defaults to 0

  • logbase (float) – Base in which all log-likelihoods calculated, defaults to 1.0001

  • beam (float) – Beam width applied to every frame in Viterbi search (smaller values mean wider beam), defaults to 1e-48

  • wbeam (float) – Beam width applied to word exits, defaults to 7e-29

  • pbeam (float) – Beam width applied to phone transitions, defaults to 1e-48

  • lpbeam (float) – Beam width applied to last phone in words, defaults to 1e-40

  • lponlybeam (float) – Beam width applied to last phone in single-phone words, defaults to 7e-29

  • fwdflatbeam (float) – Beam width applied to every frame in second-pass flat search, defaults to 1e-64

  • fwdflatwbeam (float) – Beam width applied to word exits in second-pass flat search, defaults to 7e-29

  • pl_window (int) – Phoneme lookahead window size, in frames, defaults to 5

  • pl_beam (float) – Beam width applied to phone loop search for lookahead, defaults to 1e-10

  • pl_pbeam (float) – Beam width applied to phone loop transitions for lookahead, defaults to 1e-10

  • pl_pip (float) – Phone insertion penalty for phone loop, defaults to 1.0

  • pl_weight (float) – Weight for phoneme lookahead penalties, defaults to 3.0

  • compallsen (bool) – Compute all senone scores in every frame (can be faster when there are many senones), defaults to False

  • fwdtree (bool) – Run forward lexicon-tree search (1st pass), defaults to True

  • fwdflat (bool) – Run forward flat-lexicon search over word lattice (2nd pass), defaults to True

  • bestpath (bool) – Run bestpath (Dijkstra) search over word lattice (3rd pass), defaults to True

  • backtrace (bool) – Print results and backtraces to log., defaults to False

  • latsize (int) – Initial backpointer table size, defaults to 5000

  • maxwpf (int) – Maximum number of distinct word exits at each frame (or -1 for no pruning), defaults to -1

  • maxhmmpf (int) – Maximum number of active HMMs to maintain at each frame (or -1 for no pruning), defaults to 30000

  • min_endfr (int) – Nodes ignored in lattice construction if they persist for fewer than N frames, defaults to 0

  • fwdflatefwid (int) – Minimum number of end frames for a word to be searched in fwdflat search, defaults to 4

  • fwdflatsfwin (int) – Window of frames in lattice to search for successor words in fwdflat search , defaults to 25

  • dict (str) – Main pronunciation dictionary (lexicon) input file

  • fdict (str) – Noise word pronunciation dictionary input file

  • dictcase (bool) – Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only), defaults to False

  • allphone (str) – Perform phoneme decoding with phonetic lm (given here)

  • allphone_ci (bool) – Perform phoneme decoding with phonetic lm and context-independent units only, defaults to True

  • lm (str) – Word trigram language model input file

  • lmctl (str) – Specify a set of language model

  • lmname (str) – Which language model in -lmctl to use by default

  • lw (float) – Language model probability weight, defaults to 6.5

  • fwdflatlw (float) – Language model probability weight for flat lexicon (2nd pass) decoding, defaults to 8.5

  • bestpathlw (float) – Language model probability weight for bestpath search, defaults to 9.5

  • ascale (float) – Inverse of acoustic model scale for confidence score calculation, defaults to 20.0

  • wip (float) – Word insertion penalty, defaults to 0.65

  • nwpen (float) – New word transition penalty, defaults to 1.0

  • pip (float) – Phone insertion penalty, defaults to 1.0

  • uw (float) – Unigram weight, defaults to 1.0

  • silprob (float) – Silence word transition probability, defaults to 0.005

  • fillprob (float) – Filler word transition probability, defaults to 1e-08

  • fsg (str) – Sphinx format finite state grammar file

  • jsgf (str) – JSGF grammar file

  • toprule (str) – Start rule for JSGF (first public rule is default)

  • fsgusealtpron (bool) – Add alternate pronunciations to FSG, defaults to True

  • fsgusefiller (bool) – Insert filler words at each state., defaults to True

  • keyphrase (str) – Keyphrase to spot

  • kws (str) – A file with keyphrases to spot, one per line

  • kws_plp (float) – Phone loop probability for keyphrase spotting, defaults to 0.1

  • kws_delay (int) – Delay to wait for best detection score, defaults to 10

  • kws_threshold (float) – Threshold for p(hyp)/p(alternatives) ratio, defaults to 1e-30

  • logfn (str) – File to write log messages in

  • loglevel (str) – Minimum level of log messages (DEBUG, INFO, WARN, ERROR), defaults to WARN

  • mfclogdir (str) – Directory to log feature files to

  • rawlogdir (str) – Directory to log raw audio files to

  • senlogdir (str) – Directory to log senone score files to