Main pocketsphinx package
Main module for the PocketSphinx speech recognizer.
Decoder class
- class pocketsphinx.Decoder(*args, **kwargs)
Main class for speech recognition and alignment in PocketSphinx.
See Configuration parameters for a description of keyword arguments.
Note that, as described in
Config
,hmm
,lm
, anddict
are set to the default ones (some kind of US English models of unknown origin + CMUDict) if not defined. You can prevent this by passingNone
for any of these parameters, e.g.:ps = Decoder(lm=None) # Do not load a language model
Decoder initialization will fail if more than one of
lm
,jsgf
,fsg
,keyphrase
,kws
,allphone
, orlmctl
are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize aDecoder
orConfig
with any of these (and notlm
), the defaultlm
value will be removed.You can also pass a pre-defined
Config
object as the only argument to the constructor, e.g.:config = Config.parse_json(json) ps = Decoder(config)
- Parameters
config (Config) – Optional configuration object. You can also use keyword arguments, the most important of which are noted below. See Configuration parameters for more information.
hmm (str) – Path to directory containing acoustic model files.
dict (str) – Path to pronunciation dictionary.
lm (str) – Path to N-Gram language model.
jsgf (str) – Path to JSGF grammar file.
fsg (str) – Path to FSG grammar file (only one of
lm
,jsgf
, orfsg
should be specified).toprule (str) – Name of top-level rule in JSGF file to use as entry point.
samprate (int) – Sampling rate for raw audio data.
loglevel (str) – Logging level, one of “INFO”, “ERROR”, “FATAL”.
logfn (str) – File to write log messages to.
- Raises
ValueError – On invalid configuration or argument list.
RuntimeError – On invalid configuration or other failure to reinitialize decoder.
- activate_search(self, unicode search_name=None)
Activate a search module
This activates a “search module” that was created with the methods
add_fsg
,add_lm
,add_lm_file
,add_allphone_file
,add_keyphrase
, oradd_kws
.This API is still bad, but at least the method names make sense now.
- add_allphone_file(self, unicode name, unicode lmfile=None)
Create (but do not activate) a phoneme recognition search module.
- Parameters
- Raises
RuntimeError – If allphone search init failed for some reason.
- add_fsg(self, unicode name, FsgModel fsg)
Create (but do not activate) a search module for a finite-state grammar.
- Parameters
- Raises
RuntimeError – If adding FSG failed for some reason.
- add_jsgf_file(self, name, filename)
Create (but do not activate) a search module from a JSGF file.
- Parameters
- Raises
RuntimeError – If adding grammar failed for some reason.
- add_jsgf_string(self, name, jsgf_string)
Create (but do not activate) a search module from JSGF as bytes or string.
- Parameters
- Raises
ValueError – If grammar failed to parse.
- add_keyphrase(self, unicode name, unicode keyphrase)
Create (but do not activate) search module from a single keyphrase.
- Parameters
- Raises
RuntimeError – If adding keyphrase failed for some reason.
- add_kws(self, unicode name, unicode keyfile)
Create (but do not activate) keyphrase recognition search module from a file.
- Parameters
- Raises
RuntimeError – If adding keyphrases failed for some reason.
- add_lm(self, unicode name, NGramModel lm)
Create (but do not activate) a search module for an N-Gram language model.
- Parameters
name (str) – Search module name to associate to this LM.
lm (NGramModel) – Previously loaded language model.
- Raises
RuntimeError – If adding LM failed for some reason.
- add_lm_file(self, unicode name, unicode path)
Load (but do not activate a language model from a file into the decoder.
- Parameters
- Raises
RuntimeError – If adding LM failed for some reason.
- add_word(self, unicode word, unicode phones, update=True)
Add a word to the pronunciation dictionary.
- Parameters
word (str) – Text of word to be added.
phones (str) – Space-separated list of phones for this word’s pronunciation. This will depend on the underlying acoustic model but is probably in ARPABET.
update (bool) – Update the recognizer immediately. You can set this to
False
if you are adding a lot of words, to speed things up.
- Returns
Word ID of added word.
- Return type
- Raises
RuntimeError – If adding word failed for some reason.
- config
Read-only property containing configuration object.
- create_fsg(self, unicode name, int start_state, int final_state, transitions)
Create a finite-state grammar.
This method allows the creation of a grammar directly from a list of transitions. States and words will be created implicitly from the state numbers and word strings present in this list. Make sure that the pronunciation dictionary contains the words, or you will not be able to recognize. Basic usage:
fsg = decoder.create_fsg("mygrammar", start_state=0, final_state=3, transitions=[(0, 1, 0.75, "hello"), (0, 1, 0.25, "goodbye"), (1, 2, 0.75, "beautiful"), (1, 2, 0.25, "cruel"), (2, 3, 1.0, "world")])
- Parameters
name (str) – Name to give this FSG (not very important).
start_state (int) – Index of starting state.
final_state (int) – Index of end state.
transitions (list) – List of transitions, each of which is a 3- or 4-tuple of (from, to, probability[, word]). If the word is not specified, this is an epsilon (null) transition that will always be followed.
- Returns
Newly created finite-state grammar.
- Return type
- Raises
ValueError – On invalid input.
- current_search(self)
Get the name of the current search (LM, grammar, etc).
- Returns
Name of currently active search module.
- Return type
- static default_config()
Get the default configuration.
DEPRECATED: This does the same thing as simply creating a
Config
and is here for historical reasons.- Returns
Default configuration.
- Return type
- end_utt(self)
Finish processing raw audio input.
This method must be called at the end of each separate “utterance” of raw audio input. It takes care of flushing any internal buffers and finalizing recognition results.
- static file_config(unicode path)
Parse configuration from a file.
DEPRECATED: This simply calls
Config.parse_file
and is here for historical reasons.
- get_alignment(self)
Get the current sub-word alignment, if any.
This will return something if
ps_set_alignment
has been called, but it will not contain an actual alignment (i.e. phone and state durations) unless a second pass of decoding has been run.If the decoder is not in sub-word alignment mode then it will return None.
- Returns
Alignment - if an alignment exists.
- get_cmn(self, update=False)
Get current cepstral mean.
- Parameters
update (boolean) – Update the mean based on current utterance.
- Returns
Cepstral mean as a comma-separated list of numbers.
- Return type
- get_config(self)
Get current configuration.
DEPRECATED: This does the same thing as simply accessing
config
and is here for historical reasons.- Returns
Current configuration.
- Return type
- get_fsg(self, unicode name=None)
Get the currently active FsgModel or the model for a specific search module.
- get_in_speech(self)
Return speech status.
This method is retained for compatibility, but it will always return True as long as
ps_start_utt
has been previously called.
- get_kws(self, unicode name=None)
Get keyphrases as text from current or specified search module.
- Parameters
name (str) – Search module name for keywords. If this is
None –
if (the currently active keywords are returned) –
active. (keyword search is) –
- Returns
List of keywords as lines (i.e. separated by ‘\n’), or None if the specified search could not be found, or if
name
is None and keyword search is not currently active.- Return type
- get_lattice(self)
Get word lattice from current recognition result.
- Returns
Word lattice from current result.
- Return type
- get_lm(self, unicode name=None)
Get the current N-Gram language model or the one associated with a search module.
- Parameters
name (str) – Name of search module for this language model. If this is None (default) the current LM will be returned.
- Returns
Model corresponding to
name
, or None if not found.- Return type
- get_logmath(self)
Get the LogMath object for this decoder.
DEPRECATED: This does the same thing as simply accessing
logmath
and is here for historical reasons.- Returns
Current log-math computation object.
- Return type
- get_prob(self)
Posterior probability of current recogntion hypothesis.
- Returns
Posterior probability of current hypothesis. This will be 1.0 unless the
bestpath
configuration option is enabled.- Return type
- get_search(self)
- hyp(self)
Get current recognition hypothesis.
- Returns
Current recognition output.
- Return type
- load_dict(self, unicode dict_path, unicode fdict_path=None, unicode _format=None)
Load dictionary (and possibly noise dictionary) from a file.
Note that the
format
argument does nothing, never has done anything, and never will. It’s only here for historical reasons.- Parameters
- Raises
RuntimeError – If dictionary loading failed for some reason.
- logmath
Read-only property containing LogMath object for this decoder.
- lookup_word(self, unicode word)
Look up a word in the dictionary and return phone transcription for it.
- n_frames(self)
Get the number of frames processed up to this point.
- Returns
Like it says.
- Return type
- nbest(self)
Get N-Best hypotheses.
- Returns
Generator over N-Best recognition results
- Return type
Iterable[Hypothesis]
- parse_jsgf(self, jsgf_string, toprule=None)
Parse a JSGF grammar from bytes or string.
Because PocketSphinx uses UTF-8 internally, it is more efficient to parse from bytes, as a string will get encoded and subsequently decoded.
- Parameters
- Returns
Newly loaded finite-state grammar.
- Return type
- Raises
ValueError – On failure to parse or find
toprule
.RuntimeError – If JSGF has no public rules.
- process_cep(self, data, no_search=False, full_utt=False)
Process a block of MFCC data.
- process_raw(self, data, no_search=False, full_utt=False)
Process a block of raw audio.
- read_fsg(self, filename)
Read a grammar from an FSG file.
- read_jsgf(self, unicode filename)
Read a grammar from a JSGF file.
The top rule used is the one specified by the “toprule” configuration parameter.
- reinit(self, Config config=None)
Reinitialize the decoder.
- Parameters
config (Config) – Optional new configuration to apply, otherwise the existing configuration in the
config
attribute will be reloaded.- Raises
RuntimeError – On invalid configuration or other failure to reinitialize decoder.
- reinit_feat(self, Config config=None)
Reinitialize only the feature extraction.
- Parameters
config (Config) – Optional new configuration to apply, otherwise the existing configuration in the
config
attribute will be reloaded.- Raises
RuntimeError – On invalid configuration or other failure to initialize feature extraction.
- remove_search(self, unicode search_name)
Remove a search (LM, grammar, etc) freeing resources.
- save_dict(self, unicode dict_path, unicode _format=None)
Save dictionary to a file.
Note that the
format
argument does nothing, never has done anything, and never will. It’s only here for historical reasons.- Parameters
- Raises
RuntimeError – If dictionary saving failed for some reason.
- seg(self)
Get current word segmentation.
- Returns
Generator over word segmentations.
- Return type
Iterable[Segment]
- set_align_text(self, text)
Set a word sequence for alignment and enable alignment mode.
Unlike the
add_*
methods and the deprecated, badly-namedset_*
methods, this really does immediately enable the resulting search module. This is because alignment is typically a one-shot deal, i.e. you are not likely to create a list of different alignments and keep them around. If you really want to do that, perhaps you should use FSG search instead. Or let me know and perhaps I’ll add anadd_align_text
method.You must do any text normalization yourself. For word-level alignment, once you call this, simply decode and get the segmentation in the usual manner. For phone-level alignment, see
set_alignment
andget_alignment
.- Parameters
text (str) – Sentence to align, as whitespace-separated words. All words must be present in the dictionary.
- Raises
RuntimeError – If text is invalid somehow.
- set_alignment(self, Alignment alignment=None)
Set up and activate sub-word alignment mode.
For efficiency reasons, decoding and word-level alignment (as done by
set_align_text
) do not track alignments at the sub-word level. This is fine for a lot of use cases, but obviously not all of them. If you want to obtain phone or state level alignments, you must run a second pass of alignment, which is what this function sets you up to do. The sequence is something like this:decoder.set_align_text("hello world") decoder.start_utt() decoder.process_raw(data, full_utt=True) decoder.end_utt() decoder.set_alignment() decoder.start_utt() decoder.process_raw(data, full_utt=True) decoder.end_utt() for word in decoder.get_alignment(): for phone in word: for state in phone: print(word, phone, state)
That’s a lot of code, so it may get simplified, either here or in a derived class, before release.
Note that if you are using this with N-Gram or FSG decoding, you can restore the default search module afterwards by calling activate_search() with no argument.
- Parameters
alignment (Alignment) – Pre-constructed
Alignment
object. Currently you can’t actually do anything with this.- Raises
RuntimeError – If current hypothesis cannot be aligned (such as when using keyphrase or allphone search).
- set_allphone_file(self, unicode name, unicode keyfile)
- set_cmn(self, cmn)
Get current cepstral mean.
- Parameters
cmn (str) – Cepstral mean as a comma-separated list of numbers.
- set_fsg(self, unicode name, FsgModel fsg)
- set_jsgf_file(self, name, filename)
- set_jsgf_string(self, name, jsgf_string)
- set_keyphrase(self, unicode name, unicode keyphrase)
- set_kws(self, unicode name, unicode keyfile)
- set_lm(self, unicode name, NGramModel lm)
- set_lm_file(self, unicode name, unicode path)
- set_search(self, unicode search_name)
- start_stream(self)
Reset noise statistics.
This method can be called at the beginning of a new audio stream (but this is not necessary).
- start_utt(self)
Start processing raw audio input.
This method must be called at the beginning of each separate “utterance” of raw audio input.
- Raises
RuntimeError – If processing fails to start (usually if it has already been started).
- unset_search(self, unicode search_name)
Simple Recognition classes
- class pocketsphinx.AudioFile(audio_file=None, **kwargs)[source]
Simple audio file segmentation and speech recognition.
It is recommended to use the
Segmenter
andDecoder
classes directly, but this is here in case you had code that used the old external pocketsphinx-python module, or need something very simple.
- class pocketsphinx.LiveSpeech(**kwargs)[source]
Simple endpointing and live speech recognition.
This class is not very useful for an actual application. It is recommended to use the
Endpointer
andDecoder
classes directly, but it is here in case you had code that used the old external pocketsphinx-python module, or need something incredibly simple.- property in_speech
Segmentation and Endpointing classes
- class pocketsphinx.Segmenter(*args, **kwargs)[source]
VAD-based speech segmentation.
This is a simple class that segments audio from an input stream, which is assumed to produce binary data as 16-bit signed integers when
read
is called on it. It takes the same arguments as its parentEndpointer
class.You could obviously use this on a raw audio file, but also on a
sounddevice.RawInputStream
or the output ofsox
. You can even use it with the built-inwave
module, for example:with wave.open("foo.wav", "r") as w: segmenter = Segmenter(sample_rate=w.getframerate()) for seg in segmenter.segment(w.getfp()): with wave.open("%.2f-%.2f.wav" % (seg.start_time, seg.end_time), "w") as wo: wo.setframerate(w.getframerate()) wo.writeframesraw(seg.pcm)
- Parameters
window (float) – Length in seconds of window for decision.
ratio (float) – Fraction of window that must be speech or non-speech to make a transition.
mode (int) – Aggressiveness of voice activity detction (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in
frame_length
. Outlandish sampling rates like 3924 and 115200 will raise aValueError
.frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the
frame_bytes
andframe_length
attributes to determine the input size.
- Raises
ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).
- segment(stream)[source]
Split a stream of data into speech segments.
- Parameters
stream – File-like object returning binary data (assumed to be single-channel, 16-bit integer PCM)
- Returns
Generator over
SpeechSegment
for each speech region detected by theEndpointer
.- Return type
Iterable[SpeechSegment]
- class pocketsphinx.segmenter.SpeechSegment(start_time, end_time, pcm)
- end_time
Alias for field number 1
- pcm
Alias for field number 2
- start_time
Alias for field number 0
- class pocketsphinx.Endpointer(window=0.3, ratio=0.9, vad_mode=Vad.LOOSE, sample_rate=Vad.DEFAULT_SAMPLE_RATE, frame_length=Vad.DEFAULT_FRAME_LENGTH)
Simple endpointer using voice activity detection.
- Parameters
window (float) – Length in seconds of window for decision.
ratio (float) – Fraction of window that must be speech or non-speech to make a transition.
mode (int) – Aggressiveness of voice activity detction (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in
frame_length
. Outlandish sampling rates like 3924 and 115200 will raise aValueError
.frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the
frame_bytes
andframe_length
attributes to determine the input size.
- Raises
ValueError – Invalid input parameter. Also raised if the ratio makes it impossible to do endpointing (i.e. it is more than N-1 or less than 1 frame).
- end_stream(self, frame)
Read a final frame of data and return speech if any.
This function should only be called at the end of the input stream (and then, only if you are currently in a speech region). It will return any remaining speech data detected by the endpointer.
- Parameters
frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length
frame_bytes
(in bytes) or less.- Returns
Remaining speech data (could be more than one frame), or None if none detected.
- Return type
- Raises
IndexError –
buf
is of invalid size.ValueError – Other internal VAD error.
- frame_bytes
Number of bytes (not samples) required in an input frame.
You must pass input of this size, as
bytes
, to theEndpointer
.- Type
- frame_length
Length of a frame in secondsq (may be different from the one requested in the constructor!)
- Type
- in_speech
Is the endpointer currently in a speech segment?
To detect transitions from non-speech to speech, check this before
process
. If it wasFalse
butprocess
returns data, then speech has started:prev_in_speech = ep.in_speech speech = ep.process(frame) if speech is not None: if prev_in_speech: print("Speech started at", ep.speech_start)
Likewise, to detect transitions from speech to non-speech, call this after
process
. Ifprocess
returned data but this returnsFalse
, then speech has stopped:speech = ep.process(frame) if speech is not None: if not ep.in_speech: print("Speech ended at", ep.speech_end)
- Type
- process(self, frame)
Read a frame of data and return speech if detected.
- Parameters
frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length
frame_bytes
(in bytes).- Returns
Frame of speech data, or None if none detected.
- Return type
- Raises
IndexError –
buf
is of invalid size.ValueError – Other internal VAD error.
- class pocketsphinx.Vad(mode=PS_VAD_LOOSE, sample_rate=PS_VAD_DEFAULT_SAMPLE_RATE, frame_length=PS_VAD_DEFAULT_FRAME_LENGTH)
Voice activity detection class.
- Parameters
mode (int) – Aggressiveness of voice activity detction (0-3)
sample_rate (int) – Sampling rate of input, default is 16000. Rates other than 8000, 16000, 32000, 48000 are only approximately supported, see note in
frame_length
. Outlandish sampling rates like 3924 and 115200 will raise aValueError
.frame_length (float) – Desired input frame length in seconds, default is 0.03. The actual frame length may be different if an approximately supported sampling rate is requested. You must always use the
frame_bytes
andframe_length
attributes to determine the input size.
- Raises
ValueError – Invalid input parameter (see above).
- frame_bytes
Number of bytes (not samples) required in an input frame.
You must pass input of this size, as
bytes
, to theVad
.- Type
- frame_length
Length of a frame in seconds (may be different from the one requested in the constructor!)
- Type
- is_speech(self, frame, sample_rate=None)
Classify a frame as speech or not.
- Parameters
frame (bytes) – Buffer containing speech data (16-bit signed integers). Must be of length
frame_bytes
(in bytes).- Returns
Classification as speech or not speech.
- Return type
boolean
- Raises
IndexError –
buf
is of invalid size.ValueError – Other internal VAD error.
Other classes
- class pocketsphinx.Config(*args, **kwargs)
Configuration object for PocketSphinx.
The PocketSphinx recognizer can be configured either implicitly, by passing keyword arguments to
Decoder
, or by creating and manipulatingConfig
objects. There are a large number of parameters, most of which are not important or subject to change.A
Config
can be initialized with keyword arguments:config = Config(hmm="path/to/things", dict="my.dict")
It can also be initialized by parsing JSON (either as bytes or str):
config = Config.parse_json('''{"hmm": "path/to/things", "dict": "my.dict"}''')
The “parser” is very much not strict, so you can also pass a sort of pseudo-YAML to it, e.g.:
config = Config.parse_json("hmm: path/to/things, dict: my.dict")
You can also initialize an empty
Config
and set arguments in it directly:config = Config() config["hmm"] = "path/to/things"
In general, a
Config
mostly acts like a dictionary, and can be iterated over in the same fashion. However, attempting to access a parameter that does not already exist will raise aKeyError
.Many parameters have default values. Also, when constructing a
Config
directly (as opposed to parsing JSON),hmm
,lm
, anddict
are set to the default models (some kind of US English models of unknown origin + CMUDict). You can prevent this by passingNone
for any of these parameters, e.g.:config = Config(lm=None) # Do not load a language model
Decoder initialization will fail if more than one of
lm
,jsgf
,fsg
,keyphrase
,kws
,allphone
, orlmctl
are set in the configuration. To make life easier, and because there is no possible case in which you would do this intentionally, if you initialize aDecoder
orConfig
with any of these (and notlm
), the defaultlm
value will be removed. This is not the case if you decide to set one of them in an existingConfig
, so in that case you must make sure to setlm
toNone
:config["jsgf"] = "spam_eggs_and_spam.gram" config["lm"] = None
You may also call
default_search_args()
after the fact to sethmm
,lm
, anddict
to the system defaults. Note that this will set them unconditionally.See Configuration parameters for a description of existing parameters.
- default_search_args(self)
Set arguments for the default acoustic and language model.
Set
hmm
,lm
, anddict
to the default ones (some kind of US English models of unknown origin + CMUDict). This will overwrite any previous values for these parameters, and does not check if the files exist.
- describe(self)
Iterate over parameter descriptions.
This function returns a generator over the parameters defined in a configuration, as
Arg
objects.- Returns
Descriptions of parameters including their default values and documentation
- Return type
Iterable[Arg]
- dumps(self)
Serialize configuration to a JSON-formatted
str
.This produces JSON from a configuration object, with default values included.
- Returns
Serialized JSON
- Return type
- Raises
RuntimeError – if serialization fails somehow.
- exists(self, key)
- get_boolean(self, key)
- get_float(self, key)
- get_int(self, key)
- get_string(self, key)
- items(self)
- static parse_file(unicode path)
DEPRECATED: Parse a config file.
This reads a configuration file in “command-line” format, for example:
-arg1 value -arg2 value -arg3 value
- static parse_json(json)
Parse JSON (or pseudo-YAML) configuration
- set_boolean(self, key, val)
- set_float(self, key, double val)
- set_int(self, key, long val)
- set_string(self, key, val)
- set_string_extra(self, key, val)
- class pocketsphinx.Arg(name, default, doc, type, required)
Description of a configuration parameter.
- default
Default value of parameter.
- doc
Description of parameter.
- name
Parameter name (without leading dash).
- required
Is this parameter required?
- type
Type (as a Python type object) of parameter value.
- class pocketsphinx.LogMath(base=1.0001, shift=0, use_table=False)
Log-space computation object used by PocketSphinx.
PocketSphinx does various computations internally using integer math in logarithmic space with a very small base (usually 1.0001 or 1.0003).
- add(self, p, q)
- exp(self, p)
- get_zero(self)
- ln_to_log(self, p)
- log(self, p)
- log10_to_log(self, p)
- log_to_ln(self, p)
- log_to_log10(self, p)
- class pocketsphinx.Jsgf(unicode path, Jsgf parent=None)
JSGF parser.
- build_fsg(self, JsgfRule rule, LogMath logmath, float lw)
- get_name(self)
- get_rule(self, name)
- class pocketsphinx.JsgfRule
JSGF Rule.
Do not create this class directly.
- get_name(self)
- is_public(self)
- class pocketsphinx.NGramModel(Config config, LogMath logmath, unicode path)
N-Gram language model.
- add_word(self, word, float weight)
- casefold(self, ngram_case_t kase)
- prob(self, words)
- static readfile(unicode path)
- size(self)
- static str_to_type(unicode typestr)
- static type_to_str(ngram_file_type_t _type)
- write(self, unicode path, ngram_file_type_t ftype=NGRAM_AUTO)
- class pocketsphinx.FsgModel(name, LogMath logmath, float lw, int nstate)
Finite-state recognition grammar.
- accept(self, words)
- add_alt(self, baseword, altword)
- add_silence(self, silword, int state, float silprob)
- static jsgf_read_file(unicode filename, LogMath logmath, float lw)
- null_trans_add(self, int src, int dst, int logp)
- static readfile(unicode filename, LogMath logmath, float lw)
- set_final_state(self, state)
- set_start_state(self, state)
- tag_trans_add(self, int src, int dst, int logp, int wid)
- trans_add(self, int src, int dst, int logp, int wid)
- word_add(self, word)
- word_id(self, word)
- word_str(self, wid)
- writefile(self, unicode path)
- writefile_fsm(self, unicode path)
- writefile_symtab(self, unicode path)
- class pocketsphinx.Lattice
Word lattice.
- static readfile(unicode path)
- write(self, unicode path)
- write_htk(self, unicode path)
- class pocketsphinx.Segment
Word segmentation, as generated by
Decoder.seg
.
- class pocketsphinx.Hypothesis(hypstr, score, prob)
Recognition hypothesis, as returned by
Decoder.hyp
.
- class pocketsphinx.Alignment
Sub-word alignment as returned by
get_alignment
.For the moment this is read-only. You are able to iterate over the words, phones, or states in it, as well as sub-iterating over each of their children, as described in
AlignmentEntry
.- phones(self)
Iterate over phones in the alignment.
- states(self)
Iterate over states in the alignment.
- words(self)
Iterate over words in the alignment.
- class pocketsphinx.AlignmentEntry
Entry (word, phone, state) in an alignment.
Iterating over this will iterate over its children (i.e. the phones in a word or the states in a phone) if any. For example:
for word in decoder.get_alignment(): print("%s from %.2f to %.2f" % (word.name, word.start, word.start + word.duration)) for phone in word: print("%s at %.2f duration %.2f" % (phone.name, phone.start, phone.duration))