[Ossri] Imagine, using something like this ..
Ivan Uemlianin
i.uemlianin at bangor.ac.uk
Tue Nov 14 04:01:54 EST 2006
Ken MacLean wrote:
> Hi Doug,
>
> The HTK toolkit actually does not 'require' segmented speech (i.e.
> short, 10-15 word, speech audio files without time alignments), it's
> just that Acoustic Model training performance with HTK is very slow
> when you use one large file (with time alignments) - not sure if this
> is a 'design feature'.
SphinxTrain is the same. At least, it doesn't use time alignments
(AFAIK) and won't accept input files longer than 60 secs, but the
shorter you can make your input files, the faster the training process,
and (I think) the stronger the resulting AM (stands to reason - if it's
true).
CMUSeg [1] will split up files by silence (maybe by non-speech - it
looks a bit too sophisticated to just be looking for silence). I once
wrote a GUI [2] which would show me the transcription for one of these
long files, and play just the last second of each of the segments -
enough to tell me where the transcription should be cut. I make a mark
on the transcription - the app [2] generates a transcription file for
that segment and plays the last second of the next segment. The hours
fly by: you can cut 10 hours of speech into 30 sec segments in a
morning. You're no good in the afternoon though.
Ivan
[1] CMUSeg http://www.nist.gov/speech/tools/index.htm
[2] I say 'GUI', 'app', etc. It was actually a lash-up of emacs macros
and python scripts.
More information about the Ossri
mailing list