Main pocketsphinx package
Main module for the PocketSphinx speech recognizer.
Decoder class
- class pocketsphinx.Decoder(*args, **kwargs)
Main class for speech recognition and alignment in PocketSphinx.
See Configuration parameters for a description of keyword arguments.
Note that, as described in
, anddict
are set to the default ones (some kind of US English models of unknown origin + CMUDict) if not defined. You can prevent this by passingNone
for any of these parameters, e.g.:ps = Decoder(lm=None) # Do not load a language model
Decoder initialization will fail if more than one of
, orlmctl
are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize aDecoder
with any of these (and notlm
), the defaultlm
value will be removed.You can also pass a pre-defined
object as the only argument to the constructor, e.g.:config = Config.parse_json(json) ps = Decoder(config)
- Parameters:
config (Config) – Optional configuration object. You can also use keyword arguments, the most important of which are noted below. See Configuration parameters for more information.
hmm (str) – Path to directory containing acoustic model files.
dict (str) – Path to pronunciation dictionary.
lm (str) – Path to N-Gram language model.
jsgf (str) – Path to JSGF grammar file.
fsg (str) – Path to FSG grammar file (only one of
, orfsg
should be specified).toprule (str) – Name of top-level rule in JSGF file to use as entry point.
samprate (int) – Sampling rate for raw audio data.
loglevel (str) – Logging level, one of “INFO”, “ERROR”, “FATAL”.
logfn (str) – File to write log messages to.
- Raises:
ValueError – On invalid configuration or argument list.
RuntimeError – On invalid configuration or other failure to reinitialize decoder.
- activate_search(self, unicode search_name=None)
Activate a search module
This activates a “search module” that was created with the methods
, oradd_kws
.This API is still bad, but at least the method names make sense now.
- add_allphone_file(self, unicode name, unicode lmfile=None)
Create (but do not activate) a phoneme recognition search module.
- Parameters:
- Raises:
RuntimeError – If allphone search init failed for some reason.
- add_fsg(self, unicode name, FsgModel fsg)
Create (but do not activate) a search module for a finite-state grammar.
- Parameters:
- Raises:
RuntimeError – If adding FSG failed for some reason.
- add_jsgf_file(self, name, filename)
Create (but do not activate) a search module from a JSGF file.
- Parameters:
- Raises:
RuntimeError – If adding grammar failed for some reason.
- add_jsgf_string(self, name, jsgf_string)
Create (but do not activate) a search module from JSGF as bytes or string.
- Parameters:
- Raises:
ValueError – If grammar failed to parse.
- add_keyphrase(self, unicode name, unicode keyphrase)
Create (but do not activate) search module from a single keyphrase.
- Parameters:
- Raises:
RuntimeError – If adding keyphrase failed for some reason.
- add_kws(self, unicode name, unicode keyfile)
Create (but do not activate) keyphrase recognition search module from a file.
- Parameters:
- Raises:
RuntimeError – If adding keyphrases failed for some reason.
- add_lm(self, unicode name, NGramModel lm)
Create (but do not activate) a search module for an N-Gram language model.
- Parameters:
name (str) – Search module name to associate to this LM.
lm (NGramModel) – Previously loaded language model.
- Raises:
RuntimeError – If adding LM failed for some reason.
- add_lm_file(self, unicode name, unicode path)
Load (but do not activate a language model from a file into the decoder.
- Parameters:
- Raises:
RuntimeError – If adding LM failed for some reason.
- add_word(self, unicode word, unicode phones, update=True)
Add a word to the pronunciation dictionary.
- Parameters:
word (str) – Text of word to be added.
phones (str) – Space-separated list of phones for this word’s pronunciation. This will depend on the underlying acoustic model but is probably in ARPABET.
update (bool) – Update the recognizer immediately. You can set this to
if you are adding a lot of words, to speed things up.
- Returns:
Word ID of added word.
- Return type:
- Raises:
RuntimeError – If adding word failed for some reason.
- config
Read-only property containing configuration object.
- create_fsg(self, unicode name, int start_state, int final_state, transitions)
Create a finite-state grammar.
This method allows the creation of a grammar directly from a list of transitions. States and words will be created implicitly from the state numbers and word strings present in this list. Make sure that the pronunciation dictionary contains the words, or you will not be able to recognize. Basic usage:
fsg = decoder.create_fsg("mygrammar", start_state=0, final_state=3, transitions=[(0, 1, 0.75, "hello"), (0, 1, 0.25, "goodbye"), (1, 2, 0.75, "beautiful"), (1, 2, 0.25, "cruel"), (2, 3, 1.0, "world")])
- Parameters:
name (str) – Name to give this FSG (not very important).
start_state (int) – Index of starting state.
final_state (int) – Index of end state.
transitions (list) – List of transitions, each of which is a 3- or 4-tuple of (from, to, probability[, word]). If the word is not specified, this is an epsilon (null) transition that will always be followed.
- Returns:
Newly created finite-state grammar.
- Return type:
- Raises:
ValueError – On invalid input.
- current_search(self)
Get the name of the current search (LM, grammar, etc).
- Returns:
Name of currently active search module.
- Return type:
- static default_config()
Get the default configuration.
DEPRECATED: This does the same thing as simply creating a
and is here for historical reasons.- Returns:
Default configuration.
- Return type:
- end_utt(self)
Finish processing raw audio input.
This method must be called at the end of each separate “utterance” of raw audio input. It takes care of flushing any internal buffers and finalizing recognition results.
- static file_config(unicode path)
Parse configuration from a file.
DEPRECATED: This simply calls
and is here for historical reasons.
- get_alignment(self)
Get the current sub-word alignment, if any.
This will return something if
has been called, but it will not contain an actual alignment (i.e. phone and state durations) unless a second pass of decoding has been run.If the decoder is not in sub-word alignment mode then it will return None.
- Returns:
Alignment - if an alignment exists.
- get_cmn(self, update=False)
Get current cepstral mean.
- Parameters:
update (boolean) – Update the mean based on current utterance.
- Returns:
Cepstral mean as a comma-separated list of numbers.
- Return type:
- get_config(self)
Get current configuration.
DEPRECATED: This does the same thing as simply accessing
and is here for historical reasons.- Returns:
Current configuration.
- Return type:
- get_fsg(self, unicode name=None)
Get the currently active FsgModel or the model for a specific search module.
- get_in_speech(self)
Return speech status.
This method is retained for compatibility, but it will always return True as long as
has been previously called.
- get_kws(self, unicode name=None)
Get keyphrases as text from current or specified search module.
- Parameters:
name (str) – Search module name for keywords. If this is
if (the currently active keywords are returned)
active. (keyword search is)
- Returns:
List of keywords as lines (i.e. separated by ‘\n’), or None if the specified search could not be found, or if
is None and keyword search is not currently active.- Return type:
- get_lattice(self)
Get word lattice from current recognition result.
- Returns:
Word lattice from current result.
- Return type:
- get_lm(self, unicode name=None)
Get the current N-Gram language model or the one associated with a search module.
- Parameters:
name (str) – Name of search module for this language model. If this is None (default) the current LM will be returned.
- Returns:
Model corresponding to
, or None if not found.- Return type:
- get_logmath(self)
Get the LogMath object for this decoder.
DEPRECATED: This does the same thing as simply accessing
and is here for historical reasons.- Returns:
Current log-math computation object.
- Return type:
- get_prob(self)
Posterior probability of current recogntion hypothesis.
- Returns:
Posterior probability of current hypothesis. This will be 1.0 unless the
configuration option is enabled.- Return type:
- get_search(self)
- hyp(self)
Get current recognition hypothesis.
- Returns:
Current recognition output.
- Return type:
- load_dict(self, unicode dict_path, unicode fdict_path=None, unicode _format=None)
Load dictionary (and possibly noise dictionary) from a file.
Note that the
argument does nothing, never has done anything, and never will. It’s only here for historical reasons.- Parameters:
- Raises:
RuntimeError – If dictionary loading failed for some reason.
- logmath
Read-only property containing LogMath object for this decoder.
- lookup_word(self, unicode word)
Look up a word in the dictionary and return phone transcription for it.
- n_frames(self)
Get the number of frames processed up to this point.
- Returns:
Like it says.
- Return type:
- nbest(self)
Get N-Best hypotheses.
- Returns:
Generator over N-Best recognition results
- Return type:
- parse_jsgf(self, jsgf_string, toprule=None)
Parse a JSGF grammar from bytes or string.
Because PocketSphinx uses UTF-8 internally, it is more efficient to parse from bytes, as a string will get encoded and subsequently decoded.
- Parameters:
- Returns:
Newly loaded finite-state grammar.
- Return type:
- Raises:
ValueError – On failure to parse or find
.RuntimeError – If JSGF has no public rules.
- process_cep(self, data, no_search=False, full_utt=False)
Process a block of MFCC data.
- process_raw(self, data, no_search=False, full_utt=False)
Process a block of raw audio.
- Parameters:
- Raises:
RuntimeError – If processing fails.
- read_fsg(self, filename)
Read a grammar from an FSG file.
- read_jsgf(self, unicode filename)
Read a grammar from a JSGF file.
The top rule used is the one specified by the “toprule” configuration parameter.
- reinit(self, Config config=None)
Reinitialize the decoder.
- Parameters:
config (Config) – Optional new configuration to apply, otherwise the existing configuration in the
attribute will be reloaded.- Raises:
RuntimeError – On invalid configuration or other failure to reinitialize decoder.
- reinit_feat(self, Config config=None)
Reinitialize only the feature extraction.
- Parameters:
config (Config) – Optional new configuration to apply, otherwise the existing configuration in the
attribute will be reloaded.- Raises:
RuntimeError – On invalid configuration or other failure to initialize feature extraction.
- remove_search(self, unicode search_name)
Remove a search (LM, grammar, etc) freeing resources.
- save_dict(self, unicode dict_path, unicode _format=None)
Save dictionary to a file.
Note that the
argument does nothing, never has done anything, and never will. It’s only here for historical reasons.- Parameters:
- Raises:
RuntimeError – If dictionary saving failed for some reason.
- seg(self)
Get current word segmentation.
- Returns:
Generator over word segmentations.
- Return type:
- set_align_text(self, text)
Set a word sequence for alignment and enable alignment mode.
Unlike the
methods and the deprecated, badly-namedset_*
methods, this really does immediately enable the resulting search module. This is because alignment is typically a one-shot deal, i.e. you are not likely to create a list of different alignments and keep them around. If you really want to do that, perhaps you should use FSG search instead. Or let me know and perhaps I’ll add anadd_align_text
method.You must do any text normalization yourself. For word-level alignment, once you call this, simply decode and get the segmentation in the usual manner. For phone-level alignment, see
.- Parameters:
text (str) – Sentence to align, as whitespace-separated words. All words must be present in the dictionary.
- Raises:
RuntimeError – If text is invalid somehow.
- set_alignment(self, Alignment alignment=None)
Set up and activate sub-word alignment mode.
For efficiency reasons, decoding and word-level alignment (as done by
) do not track alignments at the sub-word level. This is fine for a lot of use cases, but obviously not all of them. If you want to obtain phone or state level alignments, you must run a second pass of alignment, which is what this function sets you up to do. The sequence is something like this:decoder.set_align_text("hello world") decoder.start_utt() decoder.process_raw(data, full_utt=True) decoder.end_utt() decoder.set_alignment() decoder.start_utt() decoder.process_raw(data, full_utt=True) decoder.end_utt() for word in decoder.get_alignment(): for phone in word: for state in phone: print(,, state.start)
That’s a lot of code, so it may get simplified, either here or in a derived class, before release.
Note that if you are using this with N-Gram or FSG decoding, you can restore the default search module afterwards by calling activate_search() with no argument.
- Parameters:
alignment (Alignment) – Pre-constructed
object. Currently you can’t actually do anything with this.- Raises:
RuntimeError – If current hypothesis cannot be aligned (such as when using keyphrase or allphone search).
- set_allphone_file(self, unicode name, unicode keyfile)
- set_cmn(self, cmn)
Get current cepstral mean.
- Parameters:
cmn (str) – Cepstral mean as a comma-separated list of numbers.
- set_fsg(self, unicode name, FsgModel fsg)
- set_jsgf_file(self, name, filename)
- set_jsgf_string(self, name, jsgf_string)
- set_keyphrase(self, unicode name, unicode keyphrase)
- set_kws(self, unicode name, unicode keyfile)
- set_lm(self, unicode name, NGramModel lm)
- set_lm_file(self, unicode name, unicode path)
- set_search(self, unicode search_name)
- start_stream(self)
Reset noise statistics.
This method can be called at the beginning of a new audio stream (but this is not necessary).
- start_utt(self)
Start processing raw audio input.
This method must be called at the beginning of each separate “utterance” of raw audio input.
- Raises:
RuntimeError – If processing fails to start (usually if it has already been started).
- unset_search(self, unicode search_name)
Simple Recognition classes
- class pocketsphinx.AudioFile(audio_file=None, **kwargs)[source]
Simple audio file segmentation and speech recognition.
It is recommended to use the
classes directly, but this is here in case you had code that used the old external pocketsphinx-python module, or need something very simple.
- class pocketsphinx.LiveSpeech(**kwargs)[source]
Simple endpointing and live speech recognition.
This class is not very useful for an actual application. It is recommended to use the
classes directly, but it is here in case you had code that used the old external pocketsphinx-python module, or need something incredibly simple.- property in_speech
Segmentation and Endpointing classes
- class pocketsphinx.Segmenter(*args, **kwargs)[source]
VAD-based speech segmentation.
This is a simple class that segments audio from an input stream, which is assumed to produce binary data as 16-bit signed integers when
is called on it. It takes the same arguments as its parentEndpointer
class.You could obviously use this on a raw audio file, but also on a
or the output ofsox
. You can even use it with the built-inwave
module, for example:with"foo.wav", "r") as w: segmenter = Segmenter(sample_rate=w.getframerate()) for seg in segmenter.segment(w.getfp()): with"%.2f-%.2f.wav" % (seg.start_time, seg.end_time), "w") as wo: wo.setframerate(w.getframerate()) wo.writeframesraw(seg.pcm)
- Parameters:
window (float) – Length in seconds of window for decision.
ratio (float) – Fraction of window that must be speech or non-speech to make a transition.
mode (int) – Aggressiveness of voice activity detection (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in
. Outlandish sampling rates like 3924 and 115200 will raise aValueError
.frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the
attributes to determine the input size.
- Raises:
ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).
- segment(stream)[source]
Split a stream of data into speech segments.
- Parameters:
stream – File-like object returning binary data (assumed to be single-channel, 16-bit integer PCM)
- Returns:
Generator over
for each speech region detected by theEndpointer
.- Return type:
- class pocketsphinx.segmenter.SpeechSegment(start_time, end_time, pcm)
- end_time
Alias for field number 1
- pcm
Alias for field number 2
- start_time
Alias for field number 0
- class pocketsphinx.Endpointer(window=0.3, ratio=0.9, vad_mode=Vad.LOOSE, sample_rate=Vad.DEFAULT_SAMPLE_RATE, frame_length=Vad.DEFAULT_FRAME_LENGTH)
Simple endpointer using voice activity detection.
- Parameters:
window (float) – Length in seconds of window for decision.
ratio (float) – Fraction of window that must be speech or non-speech to make a transition.
mode (int) – Aggressiveness of voice activity detection (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in
. Outlandish sampling rates like 3924 and 115200 will raise aValueError
.frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the
attributes to determine the input size.
- Raises:
ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).
- end_stream(self, frame)
Read a final frame of data and return speech if any.
This function should only be called at the end of the input stream (and then, only if you are currently in a speech region). It will return any remaining speech data detected by the endpointer.
- Parameters:
frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length
(in bytes) or less.- Returns:
Remaining speech data (could be more than one frame), or None if none detected.
- Return type:
- Raises:
IndexError –
is of invalid size.ValueError – Other internal VAD error.
- frame_bytes
Number of bytes (not samples) required in an input frame.
You must pass input of this size, as
, to theEndpointer
.- Type:
- frame_length
Length of a frame in secondsq (may be different from the one requested in the constructor!)
- Type:
- in_speech
Is the endpointer currently in a speech segment?
To detect transitions from non-speech to speech, check this before
. If it wasFalse
returns data, then speech has started:prev_in_speech = ep.in_speech speech = ep.process(frame) if speech is not None: if prev_in_speech: print("Speech started at", ep.speech_start)
Likewise, to detect transitions from speech to non-speech, call this after
. Ifprocess
returned data but this returnsFalse
, then speech has stopped:speech = ep.process(frame) if speech is not None: if not ep.in_speech: print("Speech ended at", ep.speech_end)
- Type:
- process(self, frame)
Read a frame of data and return speech if detected.
- Parameters:
frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length
(in bytes).- Returns:
Frame of speech data, or None if none detected.
- Return type:
- Raises:
IndexError –
is of invalid size.ValueError – Other internal VAD error.
- class pocketsphinx.Vad(mode=PS_VAD_LOOSE, sample_rate=PS_VAD_DEFAULT_SAMPLE_RATE, frame_length=PS_VAD_DEFAULT_FRAME_LENGTH)
Voice activity detection class.
- Parameters:
mode (int) – Aggressiveness of voice activity detection (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in
. Outlandish sampling rates like 3924 and 115200 will raise aValueError
.frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the
attributes to determine the input size.
- Raises:
ValueError – Invalid input parameter (see above).
- frame_bytes
Number of bytes (not samples) required in an input frame.
You must pass input of this size, as
, to theVad
.- Type:
- frame_length
Length of a frame in seconds (may be different from the one requested in the constructor!)
- Type:
- is_speech(self, frame, sample_rate=None)
Classify a frame as speech or not.
- Parameters:
frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length
(in bytes).- Returns:
Classification as speech or not speech.
- Return type:
- Raises:
IndexError –
is of invalid size.ValueError – Other internal VAD error.
Other classes
- class pocketsphinx.Config(*args, **kwargs)
Configuration object for PocketSphinx.
The PocketSphinx recognizer can be configured either implicitly, by passing keyword arguments to
, or by creating and manipulatingConfig
objects. There are a large number of parameters, most of which are not important or subject to change.A
can be initialized with keyword arguments:config = Config(hmm="path/to/things", dict="my.dict")
It can also be initialized by parsing JSON (either as bytes or str):
config = Config.parse_json('''{"hmm": "path/to/things", "dict": "my.dict"}''')
The “parser” is very much not strict, so you can also pass a sort of pseudo-YAML to it, e.g.:
config = Config.parse_json("hmm: path/to/things, dict: my.dict")
You can also initialize an empty
and set arguments in it directly:config = Config() config["hmm"] = "path/to/things"
In general, a
mostly acts like a dictionary, and can be iterated over in the same fashion. However, attempting to access a parameter that does not already exist will raise aKeyError
.Many parameters have default values. Also, when constructing a
directly (as opposed to parsing JSON),hmm
, anddict
are set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passingNone
for any of these parameters, e.g.:config = Config(lm=None) # Do not load a language model
Decoder initialization will fail if more than one of
, orlmctl
are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize aDecoder
with any of these (and notlm
), the defaultlm
value will be removed. This is not the case if you decide to set one of them in an existingConfig
, so in that case you must make sure to setlm
:config["jsgf"] = "spam_eggs_and_spam.gram" config["lm"] = None
You may also call
after the fact to sethmm
, anddict
to the system defaults. Note that this will set them unconditionally.See Configuration parameters for a description of existing parameters.
- default_search_args(self)
Set arguments for the default acoustic and language model.
, anddict
to the default ones (some kind of US English models of unknown origin + CMUDict). This will overwrite any previous values for these parameters, and does not check if the files exist.
- describe(self)
Iterate over parameter descriptions.
This function returns a generator over the parameters defined in a configuration, as
objects.- Returns:
Descriptions of parameters including their default values and documentation
- Return type:
- dumps(self)
Serialize configuration to a JSON-formatted
.This produces JSON from a configuration object, with default values included.
- Returns:
Serialized JSON
- Return type:
- Raises:
RuntimeError – if serialization fails somehow.
- exists(self, key)
- get_boolean(self, key)
- get_float(self, key)
- get_int(self, key)
- get_string(self, key)
- items(self)
- static parse_file(unicode path)
DEPRECATED: Parse a config file.
This reads a configuration file in “command-line” format, for example:
-arg1 value -arg2 value -arg3 value
- static parse_json(json)
Parse JSON (or pseudo-YAML) configuration
- set_boolean(self, key, val)
- set_float(self, key, double val)
- set_int(self, key, long val)
- set_string(self, key, val)
- set_string_extra(self, key, val)
- class pocketsphinx.Arg(name, default, doc, type, required)
Description of a configuration parameter.
- default
Default value of parameter.
- doc
Description of parameter.
- name
Parameter name (without leading dash).
- required
Is this parameter required?
- type
Type (as a Python type object) of parameter value.
- class pocketsphinx.LogMath(base=1.0001, shift=0, use_table=False)
Log-space computation object used by PocketSphinx.
PocketSphinx does various computations internally using integer math in logarithmic space with a very small base (usually 1.0001 or 1.0003).
- add(self, p, q)
- exp(self, p)
- get_zero(self)
- ln_to_log(self, p)
- log(self, p)
- log10_to_log(self, p)
- log_to_ln(self, p)
- log_to_log10(self, p)
- class pocketsphinx.Jsgf(unicode path, Jsgf parent=None)
JSGF parser.
- build_fsg(self, JsgfRule rule, LogMath logmath, float lw)
- get_name(self)
- get_rule(self, name)
- class pocketsphinx.JsgfRule
JSGF Rule.
Do not create this class directly.
- get_name(self)
- is_public(self)
- class pocketsphinx.NGramModel(Config config, LogMath logmath, unicode path)
N-Gram language model.
- add_word(self, word, float weight)
- casefold(self, ngram_case_t kase)
- prob(self, words)
- static readfile(unicode path)
- size(self)
- static str_to_type(unicode typestr)
- static type_to_str(ngram_file_type_t _type)
- write(self, unicode path, ngram_file_type_t ftype=NGRAM_AUTO)
- class pocketsphinx.FsgModel(name, LogMath logmath, float lw, int nstate)
Finite-state recognition grammar.
- accept(self, words)
- add_alt(self, baseword, altword)
- add_silence(self, silword, int state, float silprob)
- static jsgf_read_file(unicode filename, LogMath logmath, float lw)
- null_trans_add(self, int src, int dst, int logp)
- static readfile(unicode filename, LogMath logmath, float lw)
- set_final_state(self, state)
- set_start_state(self, state)
- tag_trans_add(self, int src, int dst, int logp, int wid)
- trans_add(self, int src, int dst, int logp, int wid)
- word_add(self, word)
- word_id(self, word)
- word_str(self, wid)
- writefile(self, unicode path)
- writefile_fsm(self, unicode path)
- writefile_symtab(self, unicode path)
- class pocketsphinx.Lattice
Word lattice.
- static readfile(unicode path)
- write(self, unicode path)
- write_htk(self, unicode path)
- class pocketsphinx.Segment
Word segmentation, as generated by
- class pocketsphinx.Hypothesis(hypstr, score, prob)
Recognition hypothesis, as returned by
- class pocketsphinx.Alignment
Sub-word alignment as returned by
.For the moment this is read-only. You are able to iterate over the words, phones, or states in it, as well as sub-iterating over each of their children, as described in
.- phones(self)
Iterate over phones in the alignment.
- states(self)
Iterate over states in the alignment.
- words(self)
Iterate over words in the alignment.
- class pocketsphinx.AlignmentEntry
Entry (word, phone, state) in an alignment.
Iterating over this will iterate over its children (i.e. the phones in a word or the states in a phone) if any. For example:
for word in decoder.get_alignment(): print("%s from %.2f to %.2f" % (, word.start, word.start + word.duration)) for phone in word: print("%s at %.2f duration %.2f" % (, phone.start, phone.duration))