Main pocketsphinx package

Main module for the PocketSphinx speech recognizer.

Decoder class

class pocketsphinx.Decoder(*args, **kwargs)

Main class for speech recognition and alignment in PocketSphinx.

See Configuration parameters for a description of keyword arguments.

Note that, as described in Config, hmm, lm, and dict are set to the default ones (some kind of US English models of unknown origin + CMUDict) if not defined. You can prevent this by passing None for any of these parameters, e.g.:

ps = Decoder(lm=None)  # Do not load a language model

Decoder initialization will fail if more than one of lm, jsgf, fsg, keyphrase, kws, allphone, or lmctl are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize a Decoder or Config with any of these (and not lm), the default lm value will be removed.

You can also pass a pre-defined Config object as the only argument to the constructor, e.g.:

config = Config.parse_json(json)
ps = Decoder(config)

Parameters:

config (Config) – Optional configuration object. You can also use keyword arguments, the most important of which are noted below. See Configuration parameters for more information.
hmm (str) – Path to directory containing acoustic model files.
dict (str) – Path to pronunciation dictionary.
lm (str) – Path to N-Gram language model.
jsgf (str) – Path to JSGF grammar file.
fsg (str) – Path to FSG grammar file (only one of lm, jsgf, or fsg should be specified).
toprule (str) – Name of top-level rule in JSGF file to use as entry point.
samprate (int) – Sampling rate for raw audio data.
loglevel (str) – Logging level, one of “INFO”, “ERROR”, “FATAL”.
logfn (str) – File to write log messages to.

Raises:

ValueError – On invalid configuration or argument list.
RuntimeError – On invalid configuration or other failure to reinitialize decoder.

activate_search(self, str search_name=None)

Activate a search module

This activates a “search module” that was created with the methods add_fsg, add_lm, add_lm_file, add_allphone_file, add_keyphrase, or add_kws.

This API is still bad, but at least the method names make sense now.

Parameters:

search_name (str) – Name of search module to activate. If
None (or not given)
Decoder (one created with the)
instance (for)
be (will)
(re-)activated.

Raises:

KeyError – If search_name doesn’t actually exist.

add_allphone_file(self, str name, str lmfile=None)

Create (but do not activate) a phoneme recognition search module.

Parameters:

name (str) – Search module name to associate to allphone search.
lmfile (str) – Path to phoneme N-Gram file, or None to use uniform probability (default is None)

Raises:

RuntimeError – If allphone search init failed for some reason.

add_fsg(self, str name, FsgModel fsg)

Create (but do not activate) a search module for a finite-state grammar.

Parameters:

name (str) – Search module name to associate to this FSG.
fsg (FsgModel) – Previously loaded or constructed grammar.

Raises:

RuntimeError – If adding FSG failed for some reason.

add_jsgf_file(self, name, filename)

Create (but do not activate) a search module from a JSGF file.

Parameters:

filename (str) – Path to a JSGF file to load.
name (str) – Search module name to associate to this grammar.

Raises:

RuntimeError – If adding grammar failed for some reason.

add_jsgf_string(self, name, jsgf_string)

Create (but do not activate) a search module from JSGF as bytes or string.

Parameters:

jsgf_string (bytes|str) – JSGF grammar as string or UTF-8 encoded bytes.
name (str) – Search module name to associate to this grammar.

Raises:

ValueError – If grammar failed to parse.

add_keyphrase(self, str name, str keyphrase)

Create (but do not activate) search module from a single keyphrase.

Parameters:

name (str) – Search module name to associate to this keyphrase.
keyphrase (str) – Keyphrase to add.

Raises:

RuntimeError – If adding keyphrase failed for some reason.

add_kws(self, str name, str keyfile)

Create (but do not activate) keyphrase recognition search module from a file.

Parameters:

name (str) – Search module name to associate to these keyphrases.
keyfile (str) – Path to file with list of keyphrases (one per line).

Raises:

RuntimeError – If adding keyphrases failed for some reason.

add_lm(self, str name, NGramModel lm)

Create (but do not activate) a search module for an N-Gram language model.

Parameters:

name (str) – Search module name to associate to this LM.
lm (NGramModel) – Previously loaded language model.

Raises:

RuntimeError – If adding LM failed for some reason.

add_lm_file(self, str name, str path)

Load (but do not activate a language model from a file into the decoder.

Parameters:

name (str) – Search module name to associate to this LM.
path (str) – Path to N-Gram language model file.

Raises:

RuntimeError – If adding LM failed for some reason.

add_word(self, str word, str phones, update=True)

Add a word to the pronunciation dictionary.

Parameters:

word (str) – Text of word to be added.
phones (str) – Space-separated list of phones for this word’s pronunciation. This will depend on the underlying acoustic model but is probably in ARPABET.
update (bool) – Update the recognizer immediately. You can set this to False if you are adding a lot of words, to speed things up.

Returns:

Word ID of added word.

Return type:

int

Raises:

RuntimeError – If adding word failed for some reason.

config: Read-only property containing configuration object.

create_fsg(self, str name, int start_state, int final_state, transitions)

Create a finite-state grammar.

This method allows the creation of a grammar directly from a list of transitions. States and words will be created implicitly from the state numbers and word strings present in this list. Make sure that the pronunciation dictionary contains the words, or you will not be able to recognize. Basic usage:

fsg = decoder.create_fsg("mygrammar",
                         start_state=0, final_state=3,
                         transitions=[(0, 1, 0.75, "hello"),
                                      (0, 1, 0.25, "goodbye"),
                                      (1, 2, 0.75, "beautiful"),
                                      (1, 2, 0.25, "cruel"),
                                      (2, 3, 1.0, "world")])

Parameters:

name (str) – Name to give this FSG (not very important).
start_state (int) – Index of starting state.
final_state (int) – Index of end state.
transitions (list) – List of transitions, each of which is a 3- or 4-tuple of (from, to, probability[, word]). If the word is not specified, this is an epsilon (null) transition that will always be followed.

Returns:

Newly created finite-state grammar.

Return type:

FsgModel

Raises:

ValueError – On invalid input.

current_search(self)

Get the name of the current search (LM, grammar, etc).

Returns:: Name of currently active search module.
Return type:: str

static default_config()

Get the default configuration.

DEPRECATED: This does the same thing as simply creating a Config and is here for historical reasons.

Returns:: Default configuration.
Return type:: Config

end_utt(self)

Finish processing raw audio input.

This method must be called at the end of each separate “utterance” of raw audio input. It takes care of flushing any internal buffers and finalizing recognition results.

static file_config(str path)

Parse configuration from a file.

DEPRECATED: This simply calls Config.parse_file and is here for historical reasons.

Parameters:: path (str) – Path to arguments file.
Returns:: Configuration parsed from path.
Return type:: Config

get_alignment(self)

Get the current sub-word alignment, if any.

This will return something if ps_set_alignment has been called, but it will not contain an actual alignment (i.e. phone and state durations) unless a second pass of decoding has been run.

If the decoder is not in sub-word alignment mode then it will return None.

Returns:: Alignment - if an alignment exists.

get_cmn(self, update=False)

Get current cepstral mean.

Parameters:: update (boolean) – Update the mean based on current utterance.
Returns:: Cepstral mean as a comma-separated list of numbers.
Return type:: str

get_config(self)

Get current configuration.

DEPRECATED: This does the same thing as simply accessing config and is here for historical reasons.

Returns:: Current configuration.
Return type:: Config

get_fsg(self, str name=None)

Get the currently active FsgModel or the model for a specific search module.

Parameters:: name (str) – Name of search module for this FSG. If this is None (the default), the currently active FSG will be returned.
Returns:: FSG corresponding to name, or None if not found.
Return type:: FsgModel

get_in_speech(self)

Return speech status.

This method is retained for compatibility, but it will always return True as long as ps_start_utt has been previously called.

get_kws(self, str name=None)

Get keyphrases as text from current or specified search module.

Parameters:

name (str) – Search module name for keywords. If this is
None
if (the currently active keywords are returned)
active. (keyword search is)

Returns:

List of keywords as lines (i.e. separated by ‘\n’), or None if the specified search could not be found, or if name is None and keyword search is not currently active.

Return type:

str

get_lattice(self)

Get word lattice from current recognition result.

Returns:: Word lattice from current result.
Return type:: Lattice

get_lm(self, str name=None)

Get the current N-Gram language model or the one associated with a search module.

Parameters:: name (str) – Name of search module for this language model. If this is None (default) the current LM will be returned.
Returns:: Model corresponding to name, or None if not found.
Return type:: NGramModel

get_logmath(self)

Get the LogMath object for this decoder.

DEPRECATED: This does the same thing as simply accessing logmath and is here for historical reasons.

Returns:: Current log-math computation object.
Return type:: LogMath

get_prob(self)

Posterior probability of current recogntion hypothesis.

Returns:: Posterior probability of current hypothesis. This will be 1.0 unless the bestpath configuration option is enabled.
Return type:: float

get_search(self)

hyp(self)

Get current recognition hypothesis.

Returns:: Current recognition output.
Return type:: Hypothesis

load_dict(self, str dict_path, str fdict_path=None, str _format=None)

Load dictionary (and possibly noise dictionary) from a file.

Note that the format argument does nothing, never has done anything, and never will. It’s only here for historical reasons.

Parameters:

dict_path (str) – Path to pronunciation dictionary file.
fdict_path (str) – Path to noise dictionary file, or None to keep existing one (default is None)
_format (str) – Useless argument that does nothing.

Raises:

RuntimeError – If dictionary loading failed for some reason.

logmath: Read-only property containing LogMath object for this decoder.

lookup_word(self, str word)

Look up a word in the dictionary and return phone transcription for it.

Parameters:: word (str) – Text of word to search for.
Returns:: Space-separated list of phones, or None if not found.
Return type:: str

n_frames(self)

Get the number of frames processed up to this point.

Returns:: Like it says.
Return type:: int

nbest(self)

Get N-Best hypotheses.

Returns:: Generator over N-Best recognition results
Return type:: Iterable[Hypothesis]

parse_jsgf(self, jsgf_string, toprule=None)

Parse a JSGF grammar from bytes or string.

Because PocketSphinx uses UTF-8 internally, it is more efficient to parse from bytes, as a string will get encoded and subsequently decoded.

Parameters:

jsgf_string (bytes|str) – JSGF grammar as string or UTF-8 encoded bytes.
toprule (str) – Name of starting rule in grammar (will default to first public rule).

Returns:

Newly loaded finite-state grammar.

Return type:

FsgModel

Raises:

ValueError – On failure to parse or find toprule.
RuntimeError – If JSGF has no public rules.

process_cep(self, data, no_search=False, full_utt=False)

Process a block of MFCC data.

Parameters:

data (bytes) – Raw MFCC data, a block of 32-bit floating point data.
no_search (bool) – If True, do not do any decoding on this data.
full_utt (bool) – If True, assume this is the entire utterance, for purposes of acoustic normalization.

Raises:

RuntimeError – If processing fails.

process_raw(self, data, no_search=False, full_utt=False)

Process a block of raw audio.

Parameters:

data (bytes) – Raw audio data, a block of 16-bit signed integer binary data.
no_search (bool) – If True, do not do any decoding on this data.
full_utt (bool) – If True, assume this is the entire utterance, for purposes of acoustic normalization.

Raises:

RuntimeError – If processing fails.

read_fsg(self, filename)

Read a grammar from an FSG file.

Parameters:: filename (str) – Path to FSG file.
Returns:: Newly loaded finite-state grammar.
Return type:: FsgModel

read_jsgf(self, str filename)

Read a grammar from a JSGF file.

The top rule used is the one specified by the “toprule” configuration parameter.

Parameters:: filename (str) – Path to JSGF file.
Returns:: Newly loaded finite-state grammar.
Return type:: FsgModel

reinit(self, Config config=None)

Reinitialize the decoder.

Parameters:: config (Config) – Optional new configuration to apply, otherwise the existing configuration in the config attribute will be reloaded.
Raises:: RuntimeError – On invalid configuration or other failure to reinitialize decoder.

reinit_feat(self, Config config=None)

Reinitialize only the feature extraction.

Parameters:: config (Config) – Optional new configuration to apply, otherwise the existing configuration in the config attribute will be reloaded.
Raises:: RuntimeError – On invalid configuration or other failure to initialize feature extraction.

remove_search(self, str search_name)

Remove a search (LM, grammar, etc) freeing resources.

Parameters:: search_name (str) – Name of search module to remove.
Raises:: KeyError – If search_name doesn’t actually exist.

save_dict(self, str dict_path, str _format=None)

Save dictionary to a file.

Note that the format argument does nothing, never has done anything, and never will. It’s only here for historical reasons.

Parameters:

dict_path (str) – Path to save pronunciation dictionary in.
_format (str) – Useless argument that does nothing.

Raises:

RuntimeError – If dictionary saving failed for some reason.

seg(self)

Get current word segmentation.

Returns:: Generator over word segmentations.
Return type:: Iterable[Segment]

set_align_text(self, text)

Set a word sequence for alignment and enable alignment mode.

Unlike the add_* methods and the deprecated, badly-named set_* methods, this really does immediately enable the resulting search module. This is because alignment is typically a one-shot deal, i.e. you are not likely to create a list of different alignments and keep them around. If you really want to do that, perhaps you should use FSG search instead. Or let me know and perhaps I’ll add an add_align_text method.

You must do any text normalization yourself. For word-level alignment, once you call this, simply decode and get the segmentation in the usual manner. For phone-level alignment, see set_alignment and get_alignment.

Parameters:: text (str) – Sentence to align, as whitespace-separated words. All words must be present in the dictionary.
Raises:: RuntimeError – If text is invalid somehow.

set_alignment(self, Alignment alignment=None)

Set up and activate sub-word alignment mode.

For efficiency reasons, decoding and word-level alignment (as done by set_align_text) do not track alignments at the sub-word level. This is fine for a lot of use cases, but obviously not all of them. If you want to obtain phone or state level alignments, you must run a second pass of alignment, which is what this function sets you up to do. The sequence is something like this:

decoder.set_align_text("hello world")
decoder.start_utt()
decoder.process_raw(data, full_utt=True)
decoder.end_utt()
decoder.set_alignment()
decoder.start_utt()
decoder.process_raw(data, full_utt=True)
decoder.end_utt()
for word in decoder.get_alignment():
    for phone in word:
        for state in phone:
            print(word.name, phone.name, state.start)

That’s a lot of code, so it may get simplified, either here or in a derived class, before release.

Note that if you are using this with N-Gram or FSG decoding, you can restore the default search module afterwards by calling activate_search() with no argument.

Parameters:: alignment (Alignment) – Pre-constructed Alignment object. Currently you can’t actually do anything with this.
Raises:: RuntimeError – If current hypothesis cannot be aligned (such as when using keyphrase or allphone search).

set_allphone_file(self, str name, str keyfile)

set_cmn(self, cmn)

Get current cepstral mean.

Parameters:: cmn (str) – Cepstral mean as a comma-separated list of numbers.

set_fsg(self, str name, FsgModel fsg)

set_jsgf_file(self, name, filename)

set_jsgf_string(self, name, jsgf_string)

set_keyphrase(self, str name, str keyphrase)

set_kws(self, str name, str keyfile)

set_lm(self, str name, NGramModel lm)

set_lm_file(self, str name, str path)

set_search(self, str search_name)

start_stream(self)

Reset noise statistics.

This method can be called at the beginning of a new audio stream (but this is not necessary).

start_utt(self)

Start processing raw audio input.

This method must be called at the beginning of each separate “utterance” of raw audio input.

Raises:: RuntimeError – If processing fails to start (usually if it has already been started).

unset_search(self, str search_name)

Simple Recognition classes

class pocketsphinx.AudioFile(audio_file=None, **kwargs)[source]

Simple audio file segmentation and speech recognition.

It is recommended to use the Segmenter and Decoder classes directly, but this is here in case you had code that used the old external pocketsphinx-python module, or need something very simple.

stop(*args, **kwargs)[source]

class pocketsphinx.LiveSpeech(**kwargs)[source]

Simple endpointing and live speech recognition.

This class is not very useful for an actual application. It is recommended to use the Endpointer and Decoder classes directly, but it is here in case you had code that used the old external pocketsphinx-python module, or need something incredibly simple.

property in_speech

Segmentation and Endpointing classes

class pocketsphinx.Segmenter(*args, **kwargs)[source]

VAD-based speech segmentation.

This is a simple class that segments audio from an input stream, which is assumed to produce binary data as 16-bit signed integers when read is called on it. It takes the same arguments as its parent Endpointer class.

You could obviously use this on a raw audio file, but also on a sounddevice.RawInputStream or the output of sox. You can even use it with the built-in wave module, for example:

with wave.open("foo.wav", "r") as w:
    segmenter = Segmenter(sample_rate=w.getframerate())
    for seg in segmenter.segment(w.getfp()):
        with wave.open("%.2f-%.2f.wav"
                       % (seg.start_time, seg.end_time), "w") as wo:
            wo.setframerate(w.getframerate())
            wo.writeframesraw(seg.pcm)

Parameters:

window (float) – Length in seconds of window for decision.
ratio (float) – Fraction of window that must be speech or non-speech to make a transition.
mode (int) – Aggressiveness of voice activity detection (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in frame_length. Outlandish sampling rates like 3924 and 115200 will raise a ValueError.
frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the frame_bytes and frame_length attributes to determine the input size.

Raises:

ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).

segment(stream)[source]

Split a stream of data into speech segments.

Parameters:: stream – File-like object returning binary data (assumed to be single-channel, 16-bit integer PCM)
Returns:: Generator over SpeechSegment for each speech region detected by the Endpointer.
Return type:: Iterable[SpeechSegment]

class pocketsphinx.segmenter.SpeechSegment(start_time, end_time, pcm)

end_time: Alias for field number 1

pcm: Alias for field number 2

start_time: Alias for field number 0

class pocketsphinx.Endpointer(window=0.3, ratio=0.9, vad_mode=Vad.LOOSE, sample_rate=Vad.DEFAULT_SAMPLE_RATE, frame_length=Vad.DEFAULT_FRAME_LENGTH)

Simple endpointer using voice activity detection.

Parameters:

window (float) – Length in seconds of window for decision.
ratio (float) – Fraction of window that must be speech or non-speech to make a transition.
mode (int) – Aggressiveness of voice activity detection (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in frame_length. Outlandish sampling rates like 3924 and 115200 will raise a ValueError.
frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the frame_bytes and frame_length attributes to determine the input size.

Raises:

ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).

end_stream(self, frame)

Read a final frame of data and return speech if any.

This function should only be called at the end of the input stream (and then, only if you are currently in a speech region). It will return any remaining speech data detected by the endpointer.

Parameters:

frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length frame_bytes (in bytes) or less.

Returns:

Remaining speech data (could be more than one frame), or None if none detected.

Return type:

bytes

Raises:

IndexError – buf is of invalid size.
ValueError – Other internal VAD error.

frame_bytes

Number of bytes (not samples) required in an input frame.

You must pass input of this size, as bytes, to the Endpointer.

Type:: int

frame_length

Length of a frame in secondsq (may be different from the one requested in the constructor!)

Type:: float

in_speech

Is the endpointer currently in a speech segment?

To detect transitions from non-speech to speech, check this before process. If it was False but process returns data, then speech has started:

prev_in_speech = ep.in_speech
speech = ep.process(frame)
if speech is not None:
    if prev_in_speech:
        print("Speech started at", ep.speech_start)

Likewise, to detect transitions from speech to non-speech, call this after process. If process returned data but this returns False, then speech has stopped:

speech = ep.process(frame)
if speech is not None:
    if not ep.in_speech:
        print("Speech ended at", ep.speech_end)

Type:: bool

process(self, frame)

Read a frame of data and return speech if detected.

Parameters:

frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length frame_bytes (in bytes).

Returns:

Frame of speech data, or None if none detected.

Return type:

bytes

Raises:

IndexError – buf is of invalid size.
ValueError – Other internal VAD error.

sample_rate

Sampling rate of input data.

Type:: int

speech_end

End time of current speech region.

Type:: float

speech_start

Start time of current speech region.

Type:: float

class pocketsphinx.Vad(mode=PS_VAD_LOOSE, sample_rate=PS_VAD_DEFAULT_SAMPLE_RATE, frame_length=PS_VAD_DEFAULT_FRAME_LENGTH)

Voice activity detection class.

Parameters:

mode (int) – Aggressiveness of voice activity detection (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in frame_length. Outlandish sampling rates like 3924 and 115200 will raise a ValueError.
frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the frame_bytes and frame_length attributes to determine the input size.

Raises:

ValueError – Invalid input parameter (see above).

frame_bytes

Number of bytes (not samples) required in an input frame.

You must pass input of this size, as bytes, to the Vad.

Type:: int

frame_length

Length of a frame in seconds (may be different from the one requested in the constructor!)

Type:: float

is_speech(self, frame, sample_rate=None)

Classify a frame as speech or not.

Parameters:

frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length frame_bytes (in bytes).

Returns:

Classification as speech or not speech.

Return type:

boolean

Raises:

IndexError – buf is of invalid size.
ValueError – Other internal VAD error.

sample_rate

Sampling rate of input data.

Type:: int

Other classes

class pocketsphinx.Config(*args, **kwargs)

Configuration object for PocketSphinx.

The PocketSphinx recognizer can be configured either implicitly, by passing keyword arguments to Decoder, or by creating and manipulating Config objects. There are a large number of parameters, most of which are not important or subject to change.

A Config can be initialized with keyword arguments:

config = Config(hmm="path/to/things", dict="my.dict")

It can also be initialized by parsing JSON (either as bytes or str):

config = Config.parse_json('''{"hmm": "path/to/things",
                               "dict": "my.dict"}''')

The “parser” is very much not strict, so you can also pass a sort of pseudo-YAML to it, e.g.:

config = Config.parse_json("hmm: path/to/things, dict: my.dict")

You can also initialize an empty Config and set arguments in it directly:

config = Config()
config["hmm"] = "path/to/things"

In general, a Config mostly acts like a dictionary, and can be iterated over in the same fashion. However, attempting to access a parameter that does not already exist will raise a KeyError.

Many parameters have default values. Also, when constructing a Config directly (as opposed to parsing JSON), hmm, lm, and dict are set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passing None for any of these parameters, e.g.:

config = Config(lm=None)  # Do not load a language model

Decoder initialization will fail if more than one of lm, jsgf, fsg, keyphrase, kws, allphone, or lmctl are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize a Decoder or Config with any of these (and not lm), the default lm value will be removed. This is not the case if you decide to set one of them in an existing Config, so in that case you must make sure to set lm to None:

config["jsgf"] = "spam_eggs_and_spam.gram"
config["lm"] = None

You may also call default_search_args() after the fact to set hmm, lm, and dict to the system defaults. Note that this will set them unconditionally.

See Configuration parameters for a description of existing parameters.

default_search_args(self)

Set arguments for the default acoustic and language model.

Set hmm, lm, and dict to the default ones (some kind of US English models of unknown origin + CMUDict). This will overwrite any previous values for these parameters, and does not check if the files exist.

describe(self)

Iterate over parameter descriptions.

This function returns a generator over the parameters defined in a configuration, as Arg objects.

Returns:: Descriptions of parameters including their default values and documentation
Return type:: Iterable[Arg]

dumps(self)

Serialize configuration to a JSON-formatted str.

This produces JSON from a configuration object, with default values included.

Returns:: Serialized JSON
Return type:: str
Raises:: RuntimeError – if serialization fails somehow.

exists(self, key)

get_boolean(self, key)

get_float(self, key)

get_int(self, key)

get_string(self, key)

items(self)

static parse_file(str path)

DEPRECATED: Parse a config file.

This reads a configuration file in “command-line” format, for example:

-arg1 value -arg2 value
-arg3 value

Parameters:: path (str) – Path to configuration file.
Returns:: Parsed config, or None on error.
Return type:: Config

static parse_json(json)

Parse JSON (or pseudo-YAML) configuration

Parameters:: json (bytes|str) – JSON data.
Returns:: Parsed config, or None on error.
Return type:: Config

set_boolean(self, key, val)

set_float(self, key, double val)

set_int(self, key, long val)

set_string(self, key, val)

set_string_extra(self, key, val)

class pocketsphinx.Arg(name, default, doc, type, required)

Description of a configuration parameter.

default: Default value of parameter.

doc: Description of parameter.

name: Parameter name (without leading dash).

required: Is this parameter required?

type: Type (as a Python type object) of parameter value.

class pocketsphinx.LogMath(base=1.0001, shift=0, use_table=False)

Log-space computation object used by PocketSphinx.

PocketSphinx does various computations internally using integer math in logarithmic space with a very small base (usually 1.0001 or 1.0003).

add(self, p, q)

exp(self, p)

get_zero(self)

ln_to_log(self, p)

log(self, p)

log10_to_log(self, p)

log_to_ln(self, p)

log_to_log10(self, p)

class pocketsphinx.Jsgf(str path, Jsgf parent=None)

JSGF parser.

build_fsg(self, JsgfRule rule, LogMath logmath, float lw)

get_name(self)

get_rule(self, name)

class pocketsphinx.JsgfRule

JSGF Rule.

Do not create this class directly.

get_name(self)

is_public(self)

class pocketsphinx.NGramModel(Config config, LogMath logmath, str path)

N-Gram language model.

add_word(self, word, float weight)

casefold(self, ngram_case_t kase)

prob(self, words)

static readfile(str path)

size(self)

static str_to_type(str typestr)

static type_to_str(ngram_file_type_t _type)

write(self, str path, ngram_file_type_t ftype=NGRAM_AUTO)

class pocketsphinx.FsgModel(name, LogMath logmath, float lw, int nstate)

Finite-state recognition grammar.

accept(self, words)

add_alt(self, baseword, altword)

add_silence(self, silword, int state, float silprob)

static jsgf_read_file(str filename, LogMath logmath, float lw)

null_trans_add(self, int src, int dst, int logp)

static readfile(str filename, LogMath logmath, float lw)

set_final_state(self, state)

set_start_state(self, state)

tag_trans_add(self, int src, int dst, int logp, int wid)

trans_add(self, int src, int dst, int logp, int wid)

word_add(self, word)

word_id(self, word)

word_str(self, wid)

writefile(self, str path)

writefile_fsm(self, str path)

writefile_symtab(self, str path)

class pocketsphinx.Lattice

Word lattice.

static readfile(str path)

write(self, str path)

write_htk(self, str path)

class pocketsphinx.Segment

Word segmentation, as generated by Decoder.seg.

word

Name of word.

Type:: str

start_frame

Index of start frame.

Type:: int

end_frame

Index of end frame (inclusive!)

Type:: int

ascore

Acoustic score (density).

Type:: float

lscore

Language model score (joint probability).

Type:: float

lback

Language model backoff order.

Type:: int

class pocketsphinx.Hypothesis(hypstr, score, prob)

Recognition hypothesis, as returned by Decoder.hyp.

hypstr

Recognized text.

Type:: str

score

Recognition score.

Type:: float

best_score

Alias for score for compatibility.

Type:: float

prob

Posterior probability.

Type:: float

class pocketsphinx.Alignment

Sub-word alignment as returned by get_alignment.

Alignments have three levels: words, phones, and HMM states. Words contain phones, and phones contain states.

There are two ways to iterate:

Flat iteration over a single level using words(), phones(), or states().

Hierarchical iteration by iterating over an AlignmentEntry to get its children (phones of a word, or states of a phone).

phones(self): Iterate over phones in the alignment.

states(self): Iterate over states in the alignment.

words(self): Iterate over words in the alignment.

class pocketsphinx.AlignmentEntry

Entry (word, phone, or state) in an alignment.

Iterating over this will iterate over its children (phones in a word, or states in a phone) if any. For example, to print word and phone timings in seconds:

for word in decoder.get_alignment():
    print("%s from %.3f to %.3f seconds" % (word.name,
                                            word.start / 100,
                                            (word.start + word.duration) / 100))
    for phone in word:
        print("  %s at %.3f for %.3f seconds" % (phone.name,
                                                 phone.start / 100,
                                                 phone.duration / 100))

name

Text of this entry (word string, phone symbol, or state ID as string).

Type:: str

start

Start frame index. Divide by frame rate for seconds (default 100, i.e. 10ms per frame).

Type:: int

duration

Duration in frames. Divide by frame rate for seconds.

Type:: int

score

Acoustic score (log probability, higher is better).

Type:: int