Main pocketsphinx package

Main module for the PocketSphinx speech recognizer.

Decoder class

class pocketsphinx.Decoder(*args, **kwargs)

Main class for speech recognition and alignment in PocketSphinx.

See Configuration parameters for a description of keyword arguments.

Note that, as described in Config, hmm, lm, and dict are set to the default ones (some kind of US English models of unknown origin + CMUDict) if not defined. You can prevent this by passing None for any of these parameters, e.g.:

ps = Decoder(lm=None)  # Do not load a language model

Decoder initialization will fail if more than one of lm, jsgf, fsg, keyphrase, kws, allphone, or lmctl are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize a Decoder or Config with any of these (and not lm), the default lm value will be removed.

You can also pass a pre-defined Config object as the only argument to the constructor, e.g.:

config = Config.parse_json(json)
ps = Decoder(config)
Parameters
  • config (Config) – Optional configuration object. You can also use keyword arguments, the most important of which are noted below. See Configuration parameters for more information.

  • hmm (str) – Path to directory containing acoustic model files.

  • dict (str) – Path to pronunciation dictionary.

  • lm (str) – Path to N-Gram language model.

  • jsgf (str) – Path to JSGF grammar file.

  • fsg (str) – Path to FSG grammar file (only one of lm, jsgf, or fsg should be specified).

  • toprule (str) – Name of top-level rule in JSGF file to use as entry point.

  • samprate (int) – Sampling rate for raw audio data.

  • loglevel (str) – Logging level, one of “INFO”, “ERROR”, “FATAL”.

  • logfn (str) – File to write log messages to.

Raises
  • ValueError – On invalid configuration or argument list.

  • RuntimeError – On invalid configuration or other failure to reinitialize decoder.

Activate a search module

This activates a “search module” that was created with the methods add_fsg, add_lm, add_lm_file, add_allphone_file, add_keyphrase, or add_kws.

This API is still bad, but at least the method names make sense now.

Parameters
  • search_name (str) – Name of search module to activate. If

  • None (or not given) –

  • Decoder (one created with the) –

  • instance (for) –

  • be (will) –

  • (re-)activated.

Raises

KeyError – If search_name doesn’t actually exist.

add_allphone_file(self, unicode name, unicode lmfile=None)

Create (but do not activate) a phoneme recognition search module.

Parameters
  • name (str) – Search module name to associate to allphone search.

  • lmfile (str) – Path to phoneme N-Gram file, or None to use uniform probability (default is None)

Raises

RuntimeError – If allphone search init failed for some reason.

add_fsg(self, unicode name, FsgModel fsg)

Create (but do not activate) a search module for a finite-state grammar.

Parameters
  • name (str) – Search module name to associate to this FSG.

  • fsg (FsgModel) – Previously loaded or constructed grammar.

Raises

RuntimeError – If adding FSG failed for some reason.

add_jsgf_file(self, name, filename)

Create (but do not activate) a search module from a JSGF file.

Parameters
  • filename (str) – Path to a JSGF file to load.

  • name (str) – Search module name to associate to this grammar.

Raises

RuntimeError – If adding grammar failed for some reason.

add_jsgf_string(self, name, jsgf_string)

Create (but do not activate) a search module from JSGF as bytes or string.

Parameters
  • jsgf_string (bytes|str) – JSGF grammar as string or UTF-8 encoded bytes.

  • name (str) – Search module name to associate to this grammar.

Raises

ValueError – If grammar failed to parse.

add_keyphrase(self, unicode name, unicode keyphrase)

Create (but do not activate) search module from a single keyphrase.

Parameters
  • name (str) – Search module name to associate to this keyphrase.

  • keyphrase (str) – Keyphrase to add.

Raises

RuntimeError – If adding keyphrase failed for some reason.

add_kws(self, unicode name, unicode keyfile)

Create (but do not activate) keyphrase recognition search module from a file.

Parameters
  • name (str) – Search module name to associate to these keyphrases.

  • keyfile (str) – Path to file with list of keyphrases (one per line).

Raises

RuntimeError – If adding keyphrases failed for some reason.

add_lm(self, unicode name, NGramModel lm)

Create (but do not activate) a search module for an N-Gram language model.

Parameters
  • name (str) – Search module name to associate to this LM.

  • lm (NGramModel) – Previously loaded language model.

Raises

RuntimeError – If adding LM failed for some reason.

add_lm_file(self, unicode name, unicode path)

Load (but do not activate a language model from a file into the decoder.

Parameters
  • name (str) – Search module name to associate to this LM.

  • path (str) – Path to N-Gram language model file.

Raises

RuntimeError – If adding LM failed for some reason.

add_word(self, unicode word, unicode phones, update=True)

Add a word to the pronunciation dictionary.

Parameters
  • word (str) – Text of word to be added.

  • phones (str) – Space-separated list of phones for this word’s pronunciation. This will depend on the underlying acoustic model but is probably in ARPABET.

  • update (bool) – Update the recognizer immediately. You can set this to False if you are adding a lot of words, to speed things up.

Returns

Word ID of added word.

Return type

int

Raises

RuntimeError – If adding word failed for some reason.

config

Read-only property containing configuration object.

create_fsg(self, unicode name, int start_state, int final_state, transitions)

Create a finite-state grammar.

This method allows the creation of a grammar directly from a list of transitions. States and words will be created implicitly from the state numbers and word strings present in this list. Make sure that the pronunciation dictionary contains the words, or you will not be able to recognize. Basic usage:

fsg = decoder.create_fsg("mygrammar",
                         start_state=0, final_state=3,
                         transitions=[(0, 1, 0.75, "hello"),
                                      (0, 1, 0.25, "goodbye"),
                                      (1, 2, 0.75, "beautiful"),
                                      (1, 2, 0.25, "cruel"),
                                      (2, 3, 1.0, "world")])
Parameters
  • name (str) – Name to give this FSG (not very important).

  • start_state (int) – Index of starting state.

  • final_state (int) – Index of end state.

  • transitions (list) – List of transitions, each of which is a 3- or 4-tuple of (from, to, probability[, word]). If the word is not specified, this is an epsilon (null) transition that will always be followed.

Returns

Newly created finite-state grammar.

Return type

FsgModel

Raises

ValueError – On invalid input.

Get the name of the current search (LM, grammar, etc).

Returns

Name of currently active search module.

Return type

str

static default_config()

Get the default configuration.

DEPRECATED: This does the same thing as simply creating a Config and is here for historical reasons.

Returns

Default configuration.

Return type

Config

end_utt(self)

Finish processing raw audio input.

This method must be called at the end of each separate “utterance” of raw audio input. It takes care of flushing any internal buffers and finalizing recognition results.

static file_config(unicode path)

Parse configuration from a file.

DEPRECATED: This simply calls Config.parse_file and is here for historical reasons.

Parameters

path (str) – Path to arguments file.

Returns

Configuration parsed from path.

Return type

Config

get_alignment(self)

Get the current sub-word alignment, if any.

This will return something if ps_set_alignment has been called, but it will not contain an actual alignment (i.e. phone and state durations) unless a second pass of decoding has been run.

If the decoder is not in sub-word alignment mode then it will return None.

Returns

Alignment - if an alignment exists.

get_cmn(self, update=False)

Get current cepstral mean.

Parameters

update (boolean) – Update the mean based on current utterance.

Returns

Cepstral mean as a comma-separated list of numbers.

Return type

str

get_config(self)

Get current configuration.

DEPRECATED: This does the same thing as simply accessing config and is here for historical reasons.

Returns

Current configuration.

Return type

Config

get_fsg(self, unicode name=None)

Get the currently active FsgModel or the model for a specific search module.

Parameters

name (str) – Name of search module for this FSG. If this is None (the default), the currently active FSG will be returned.

Returns

FSG corresponding to name, or None if not found.

Return type

FsgModel

get_in_speech(self)

Return speech status.

This method is retained for compatibility, but it will always return True as long as ps_start_utt has been previously called.

get_kws(self, unicode name=None)

Get keyphrases as text from current or specified search module.

Parameters
  • name (str) – Search module name for keywords. If this is

  • None

  • if (the currently active keywords are returned) –

  • active. (keyword search is) –

Returns

List of keywords as lines (i.e. separated by ‘\n’), or None if the specified search could not be found, or if name is None and keyword search is not currently active.

Return type

str

get_lattice(self)

Get word lattice from current recognition result.

Returns

Word lattice from current result.

Return type

Lattice

get_lm(self, unicode name=None)

Get the current N-Gram language model or the one associated with a search module.

Parameters

name (str) – Name of search module for this language model. If this is None (default) the current LM will be returned.

Returns

Model corresponding to name, or None if not found.

Return type

NGramModel

get_logmath(self)

Get the LogMath object for this decoder.

DEPRECATED: This does the same thing as simply accessing logmath and is here for historical reasons.

Returns

Current log-math computation object.

Return type

LogMath

get_prob(self)

Posterior probability of current recogntion hypothesis.

Returns

Posterior probability of current hypothesis. This will be 1.0 unless the bestpath configuration option is enabled.

Return type

float

hyp(self)

Get current recognition hypothesis.

Returns

Current recognition output.

Return type

Hypothesis

load_dict(self, unicode dict_path, unicode fdict_path=None, unicode _format=None)

Load dictionary (and possibly noise dictionary) from a file.

Note that the format argument does nothing, never has done anything, and never will. It’s only here for historical reasons.

Parameters
  • dict_path (str) – Path to pronunciation dictionary file.

  • fdict_path (str) – Path to noise dictionary file, or None to keep existing one (default is None)

  • _format (str) – Useless argument that does nothing.

Raises

RuntimeError – If dictionary loading failed for some reason.

logmath

Read-only property containing LogMath object for this decoder.

lookup_word(self, unicode word)

Look up a word in the dictionary and return phone transcription for it.

Parameters

word (str) – Text of word to search for.

Returns

Space-separated list of phones, or None if not found.

Return type

str

n_frames(self)

Get the number of frames processed up to this point.

Returns

Like it says.

Return type

int

nbest(self)

Get N-Best hypotheses.

Returns

Generator over N-Best recognition results

Return type

Iterable[Hypothesis]

parse_jsgf(self, jsgf_string, toprule=None)

Parse a JSGF grammar from bytes or string.

Because PocketSphinx uses UTF-8 internally, it is more efficient to parse from bytes, as a string will get encoded and subsequently decoded.

Parameters
  • jsgf_string (bytes|str) – JSGF grammar as string or UTF-8 encoded bytes.

  • toprule (str) – Name of starting rule in grammar (will default to first public rule).

Returns

Newly loaded finite-state grammar.

Return type

FsgModel

Raises
process_cep(self, data, no_search=False, full_utt=False)

Process a block of MFCC data.

Parameters
  • data (bytes) – Raw MFCC data, a block of 32-bit floating point data.

  • no_search (bool) – If True, do not do any decoding on this data.

  • full_utt (bool) – If True, assume this is the entire utterance, for purposes of acoustic normalization.

Raises

RuntimeError – If processing fails.

process_raw(self, data, no_search=False, full_utt=False)

Process a block of raw audio.

Parameters
  • data (bytes) – Raw audio data, a block of 16-bit signed integer binary data.

  • no_search (bool) – If True, do not do any decoding on this data.

  • full_utt (bool) – If True, assume this is the entire utterance, for purposes of acoustic normalization.

Raises

RuntimeError – If processing fails.

read_fsg(self, filename)

Read a grammar from an FSG file.

Parameters

filename (str) – Path to FSG file.

Returns

Newly loaded finite-state grammar.

Return type

FsgModel

read_jsgf(self, unicode filename)

Read a grammar from a JSGF file.

The top rule used is the one specified by the “toprule” configuration parameter.

Parameters

filename (str) – Path to JSGF file.

Returns

Newly loaded finite-state grammar.

Return type

FsgModel

reinit(self, Config config=None)

Reinitialize the decoder.

Parameters

config (Config) – Optional new configuration to apply, otherwise the existing configuration in the config attribute will be reloaded.

Raises

RuntimeError – On invalid configuration or other failure to reinitialize decoder.

reinit_feat(self, Config config=None)

Reinitialize only the feature extraction.

Parameters

config (Config) – Optional new configuration to apply, otherwise the existing configuration in the config attribute will be reloaded.

Raises

RuntimeError – On invalid configuration or other failure to initialize feature extraction.

Remove a search (LM, grammar, etc) freeing resources.

Parameters

search_name (str) – Name of search module to remove.

Raises

KeyError – If search_name doesn’t actually exist.

save_dict(self, unicode dict_path, unicode _format=None)

Save dictionary to a file.

Note that the format argument does nothing, never has done anything, and never will. It’s only here for historical reasons.

Parameters
  • dict_path (str) – Path to save pronunciation dictionary in.

  • _format (str) – Useless argument that does nothing.

Raises

RuntimeError – If dictionary saving failed for some reason.

seg(self)

Get current word segmentation.

Returns

Generator over word segmentations.

Return type

Iterable[Segment]

set_align_text(self, text)

Set a word sequence for alignment and enable alignment mode.

Unlike the add_* methods and the deprecated, badly-named set_* methods, this really does immediately enable the resulting search module. This is because alignment is typically a one-shot deal, i.e. you are not likely to create a list of different alignments and keep them around. If you really want to do that, perhaps you should use FSG search instead. Or let me know and perhaps I’ll add an add_align_text method.

You must do any text normalization yourself. For word-level alignment, once you call this, simply decode and get the segmentation in the usual manner. For phone-level alignment, see set_alignment and get_alignment.

Parameters

text (str) – Sentence to align, as whitespace-separated words. All words must be present in the dictionary.

Raises

RuntimeError – If text is invalid somehow.

set_alignment(self, Alignment alignment=None)

Set up and activate sub-word alignment mode.

For efficiency reasons, decoding and word-level alignment (as done by set_align_text) do not track alignments at the sub-word level. This is fine for a lot of use cases, but obviously not all of them. If you want to obtain phone or state level alignments, you must run a second pass of alignment, which is what this function sets you up to do. The sequence is something like this:

decoder.set_align_text("hello world")
decoder.start_utt()
decoder.process_raw(data, full_utt=True)
decoder.end_utt()
decoder.set_alignment()
decoder.start_utt()
decoder.process_raw(data, full_utt=True)
decoder.end_utt()
for word in decoder.get_alignment():
    for phone in word:
        for state in phone:
            print(word, phone, state)

That’s a lot of code, so it may get simplified, either here or in a derived class, before release.

Note that if you are using this with N-Gram or FSG decoding, you can restore the default search module afterwards by calling activate_search() with no argument.

Parameters

alignment (Alignment) – Pre-constructed Alignment object. Currently you can’t actually do anything with this.

Raises

RuntimeError – If current hypothesis cannot be aligned (such as when using keyphrase or allphone search).

set_allphone_file(self, unicode name, unicode keyfile)
set_cmn(self, cmn)

Get current cepstral mean.

Parameters

cmn (str) – Cepstral mean as a comma-separated list of numbers.

set_fsg(self, unicode name, FsgModel fsg)
set_jsgf_file(self, name, filename)
set_jsgf_string(self, name, jsgf_string)
set_keyphrase(self, unicode name, unicode keyphrase)
set_kws(self, unicode name, unicode keyfile)
set_lm(self, unicode name, NGramModel lm)
set_lm_file(self, unicode name, unicode path)
start_stream(self)

Reset noise statistics.

This method can be called at the beginning of a new audio stream (but this is not necessary).

start_utt(self)

Start processing raw audio input.

This method must be called at the beginning of each separate “utterance” of raw audio input.

Raises

RuntimeError – If processing fails to start (usually if it has already been started).

Simple Recognition classes

class pocketsphinx.AudioFile(audio_file=None, **kwargs)[source]

Simple audio file segmentation and speech recognition.

It is recommended to use the Segmenter and Decoder classes directly, but this is here in case you had code that used the old external pocketsphinx-python module, or need something very simple.

stop(*args, **kwargs)[source]
class pocketsphinx.LiveSpeech(**kwargs)[source]

Simple endpointing and live speech recognition.

This class is not very useful for an actual application. It is recommended to use the Endpointer and Decoder classes directly, but it is here in case you had code that used the old external pocketsphinx-python module, or need something incredibly simple.

property in_speech

Segmentation and Endpointing classes

class pocketsphinx.Segmenter(*args, **kwargs)[source]

VAD-based speech segmentation.

This is a simple class that segments audio from an input stream, which is assumed to produce binary data as 16-bit signed integers when read is called on it. It takes the same arguments as its parent Endpointer class.

You could obviously use this on a raw audio file, but also on a sounddevice.RawInputStream or the output of sox. You can even use it with the built-in wave module, for example:

with wave.open("foo.wav", "r") as w:
    segmenter = Segmenter(sample_rate=w.getframerate())
    for seg in segmenter.segment(w.getfp()):
        with wave.open("%.2f-%.2f.wav"
                       % (seg.start_time, seg.end_time), "w") as wo:
            wo.setframerate(w.getframerate())
            wo.writeframesraw(seg.pcm)
Parameters
  • window (float) – Length in seconds of window for decision.

  • ratio (float) – Fraction of window that must be speech or non-speech to make a transition.

  • mode (int) – Aggressiveness of voice activity detction (0-3)

  • sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in frame_length. Outlandish sampling rates like 3924 and 115200 will raise a ValueError.

  • frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the frame_bytes and frame_length attributes to determine the input size.

Raises

ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).

segment(stream)[source]

Split a stream of data into speech segments.

Parameters

stream – File-like object returning binary data (assumed to be single-channel, 16-bit integer PCM)

Returns

Generator over SpeechSegment for each speech region detected by the Endpointer.

Return type

Iterable[SpeechSegment]

class pocketsphinx.segmenter.SpeechSegment(start_time, end_time, pcm)
end_time

Alias for field number 1

pcm

Alias for field number 2

start_time

Alias for field number 0

class pocketsphinx.Endpointer(window=0.3, ratio=0.9, vad_mode=Vad.LOOSE, sample_rate=Vad.DEFAULT_SAMPLE_RATE, frame_length=Vad.DEFAULT_FRAME_LENGTH)

Simple endpointer using voice activity detection.

Parameters
  • window (float) – Length in seconds of window for decision.

  • ratio (float) – Fraction of window that must be speech or non-speech to make a transition.

  • mode (int) – Aggressiveness of voice activity detction (0-3)

  • sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in frame_length. Outlandish sampling rates like 3924 and 115200 will raise a ValueError.

  • frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the frame_bytes and frame_length attributes to determine the input size.

Raises

ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).

end_stream(self, frame)

Read a final frame of data and return speech if any.

This function should only be called at the end of the input stream (and then, only if you are currently in a speech region). It will return any remaining speech data detected by the endpointer.

Parameters

frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length frame_bytes (in bytes) or less.

Returns

Remaining speech data (could be more than one frame), or None if none detected.

Return type

bytes

Raises
frame_bytes

Number of bytes (not samples) required in an input frame.

You must pass input of this size, as bytes, to the Endpointer.

Type

int

frame_length

Length of a frame in secondsq (may be different from the one requested in the constructor!)

Type

float

in_speech

Is the endpointer currently in a speech segment?

To detect transitions from non-speech to speech, check this before process. If it was False but process returns data, then speech has started:

prev_in_speech = ep.in_speech
speech = ep.process(frame)
if speech is not None:
    if prev_in_speech:
        print("Speech started at", ep.speech_start)

Likewise, to detect transitions from speech to non-speech, call this after process. If process returned data but this returns False, then speech has stopped:

speech = ep.process(frame)
if speech is not None:
    if not ep.in_speech:
        print("Speech ended at", ep.speech_end)
Type

bool

process(self, frame)

Read a frame of data and return speech if detected.

Parameters

frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length frame_bytes (in bytes).

Returns

Frame of speech data, or None if none detected.

Return type

bytes

Raises
sample_rate

Sampling rate of input data.

Type

int

speech_end

End time of current speech region.

Type

float

speech_start

Start time of current speech region.

Type

float

class pocketsphinx.Vad(mode=PS_VAD_LOOSE, sample_rate=PS_VAD_DEFAULT_SAMPLE_RATE, frame_length=PS_VAD_DEFAULT_FRAME_LENGTH)

Voice activity detection class.

Parameters
  • mode (int) – Aggressiveness of voice activity detction (0-3)

  • sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in frame_length. Outlandish sampling rates like 3924 and 115200 will raise a ValueError.

  • frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the frame_bytes and frame_length attributes to determine the input size.

Raises

ValueError – Invalid input parameter (see above).

frame_bytes

Number of bytes (not samples) required in an input frame.

You must pass input of this size, as bytes, to the Vad.

Type

int

frame_length

Length of a frame in seconds (may be different from the one requested in the constructor!)

Type

float

is_speech(self, frame, sample_rate=None)

Classify a frame as speech or not.

Parameters

frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length frame_bytes (in bytes).

Returns

Classification as speech or not speech.

Return type

boolean

Raises
sample_rate

Sampling rate of input data.

Type

int

Other classes

class pocketsphinx.Config(*args, **kwargs)

Configuration object for PocketSphinx.

The PocketSphinx recognizer can be configured either implicitly, by passing keyword arguments to Decoder, or by creating and manipulating Config objects. There are a large number of parameters, most of which are not important or subject to change.

A Config can be initialized with keyword arguments:

config = Config(hmm="path/to/things", dict="my.dict")

It can also be initialized by parsing JSON (either as bytes or str):

config = Config.parse_json('''{"hmm": "path/to/things",
                               "dict": "my.dict"}''')

The “parser” is very much not strict, so you can also pass a sort of pseudo-YAML to it, e.g.:

config = Config.parse_json("hmm: path/to/things, dict: my.dict")

You can also initialize an empty Config and set arguments in it directly:

config = Config()
config["hmm"] = "path/to/things"

In general, a Config mostly acts like a dictionary, and can be iterated over in the same fashion. However, attempting to access a parameter that does not already exist will raise a KeyError.

Many parameters have default values. Also, when constructing a Config directly (as opposed to parsing JSON), hmm, lm, and dict are set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passing None for any of these parameters, e.g.:

config = Config(lm=None)  # Do not load a language model

Decoder initialization will fail if more than one of lm, jsgf, fsg, keyphrase, kws, allphone, or lmctl are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize a Decoder or Config with any of these (and not lm), the default lm value will be removed. This is not the case if you decide to set one of them in an existing Config, so in that case you must make sure to set lm to None:

config["jsgf"] = "spam_eggs_and_spam.gram"
config["lm"] = None

You may also call default_search_args() after the fact to set hmm, lm, and dict to the system defaults. Note that this will set them unconditionally.

See Configuration parameters for a description of existing parameters.

default_search_args(self)

Set arguments for the default acoustic and language model.

Set hmm, lm, and dict to the default ones (some kind of US English models of unknown origin + CMUDict). This will overwrite any previous values for these parameters, and does not check if the files exist.

describe(self)

Iterate over parameter descriptions.

This function returns a generator over the parameters defined in a configuration, as Arg objects.

Returns

Descriptions of parameters including their default values and documentation

Return type

Iterable[Arg]

dumps(self)

Serialize configuration to a JSON-formatted str.

This produces JSON from a configuration object, with default values included.

Returns

Serialized JSON

Return type

str

Raises

RuntimeError – if serialization fails somehow.

exists(self, key)
get_boolean(self, key)
get_float(self, key)
get_int(self, key)
get_string(self, key)
items(self)
static parse_file(unicode path)

DEPRECATED: Parse a config file.

This reads a configuration file in “command-line” format, for example:

-arg1 value -arg2 value
-arg3 value
Parameters

path (str) – Path to configuration file.

Returns

Parsed config, or None on error.

Return type

Config

static parse_json(json)

Parse JSON (or pseudo-YAML) configuration

Parameters

json (bytes|str) – JSON data.

Returns

Parsed config, or None on error.

Return type

Config

set_boolean(self, key, val)
set_float(self, key, double val)
set_int(self, key, long val)
set_string(self, key, val)
set_string_extra(self, key, val)
class pocketsphinx.Arg(name, default, doc, type, required)

Description of a configuration parameter.

default

Default value of parameter.

doc

Description of parameter.

name

Parameter name (without leading dash).

required

Is this parameter required?

type

Type (as a Python type object) of parameter value.

class pocketsphinx.LogMath(base=1.0001, shift=0, use_table=False)

Log-space computation object used by PocketSphinx.

PocketSphinx does various computations internally using integer math in logarithmic space with a very small base (usually 1.0001 or 1.0003).

add(self, p, q)
exp(self, p)
get_zero(self)
ln_to_log(self, p)
log(self, p)
log10_to_log(self, p)
log_to_ln(self, p)
log_to_log10(self, p)
class pocketsphinx.Jsgf(unicode path, Jsgf parent=None)

JSGF parser.

build_fsg(self, JsgfRule rule, LogMath logmath, float lw)
get_name(self)
get_rule(self, name)
class pocketsphinx.JsgfRule

JSGF Rule.

Do not create this class directly.

get_name(self)
is_public(self)
class pocketsphinx.NGramModel(Config config, LogMath logmath, unicode path)

N-Gram language model.

add_word(self, word, float weight)
casefold(self, ngram_case_t kase)
prob(self, words)
static readfile(unicode path)
size(self)
static str_to_type(unicode typestr)
static type_to_str(ngram_file_type_t _type)
write(self, unicode path, ngram_file_type_t ftype=NGRAM_AUTO)
class pocketsphinx.FsgModel(name, LogMath logmath, float lw, int nstate)

Finite-state recognition grammar.

accept(self, words)
add_alt(self, baseword, altword)
add_silence(self, silword, int state, float silprob)
static jsgf_read_file(unicode filename, LogMath logmath, float lw)
null_trans_add(self, int src, int dst, int logp)
static readfile(unicode filename, LogMath logmath, float lw)
set_final_state(self, state)
set_start_state(self, state)
tag_trans_add(self, int src, int dst, int logp, int wid)
trans_add(self, int src, int dst, int logp, int wid)
word_add(self, word)
word_id(self, word)
word_str(self, wid)
writefile(self, unicode path)
writefile_fsm(self, unicode path)
writefile_symtab(self, unicode path)
class pocketsphinx.Lattice

Word lattice.

static readfile(unicode path)
write(self, unicode path)
write_htk(self, unicode path)
class pocketsphinx.Segment

Word segmentation, as generated by Decoder.seg.

word

Name of word.

Type

str

start_frame

Index of start frame.

Type

int

end_frame

Index of end frame (inclusive!)

Type

int

ascore

Acoustic score (density).

Type

float

lscore

Language model score (joint probability).

Type

float

lback

Language model backoff order.

Type

int

class pocketsphinx.Hypothesis(hypstr, score, prob)

Recognition hypothesis, as returned by Decoder.hyp.

hypstr

Recognized text.

Type

str

score

Recognition score.

Type

float

best_score

Alias for score for compatibility.

Type

float

prob

Posterior probability.

Type

float

class pocketsphinx.Alignment

Sub-word alignment as returned by get_alignment.

For the moment this is read-only. You are able to iterate over the words, phones, or states in it, as well as sub-iterating over each of their children, as described in AlignmentEntry.

phones(self)

Iterate over phones in the alignment.

states(self)

Iterate over states in the alignment.

words(self)

Iterate over words in the alignment.

class pocketsphinx.AlignmentEntry

Entry (word, phone, state) in an alignment.

Iterating over this will iterate over its children (i.e. the phones in a word or the states in a phone) if any. For example:

for word in decoder.get_alignment():
    print("%s from %.2f to %.2f" % (word.name, word.start,
                                    word.start + word.duration))
    for phone in word:
        print("%s at %.2f duration %.2f" %
              (phone.name, phone.start, phone.duration))
name

Name of segment (word, phone name, state id)

Type

str

start

Index of start frame.

Type

int

duration

Duration in frames.

Type

int

score

Acoustic score (density).

Type

float