_____________________________________________________________ _ _ _ (_)_ __ _ __ _ __ ___ _ __ _ __ ___ (_) ___ ___| |_ | | '_ \| '_ \| '__/ _ \ | '_ \| '__/ _ \| |/ _ \/ __| __| | | | | | |_) | | | (_) | | |_) | | | (_) | | __/ (__| |_ |_|_| |_| .__/|_| \___/ | .__/|_| \___// |\___|\___|\__| |_| |_| |__/ _____________________________________________________________ (c) 2008, 2009 inpro project, Timo Baumann funded by DFG Emmy-Noether programme please feel free to contact timo@ling.uni-potsdam.de for any questions and comments. see bottom of this page for license and citation information. _____________________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Filtering of Incremental Results for Sphinx-4 _____________________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ This package contains code to output incremental, non-final results from the Sphinx-4 speech recognition software [1], and filtering algorithms that improve incremental performance. This code should run with the current Sphinx-4 beta 2. __________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Usage as a Sphinx-4 ResultListener __________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ The code works by plugging into the Sphinx ResultListener framework. It then receives regular updates about the recognition process, filters the incoming information (see our paper for details) and passes changes to the current hypothesis on to one or more registered HypothesisChangeListeners. Following the steps below should make this work for you: - Register CurrentHypothesis as a ResultListener of the recognizer. Add the following monitors list in the -configuration (or extend the list): currentHypothesis - Set sphinx to process audio in smaller chunks and to update ResultListeners more often. Notice that the algorithm works regardless of your featureBlockSize. But it is only fun if you run it with small values. Some versions of sphinx must be set specifically to call ResultListeners on non-final results. Both can be accomplished by adding (or replacing) the two following lines in the -configuration: - define incrementalHypothesis as CurrentHypothesis, decide on the filter method you want to use (if any) and register one or more HypothesisChangeListeners to output incremental results. Add the following to your : (leave out the line numbers, of course) 1 2 3 4 5 consoleNotifier 6 7 8 9 10 11 The configuration given above should define a currentHypothesis that registers as a ResultListener to the recognizer (see below!), applies smoothing-filtering (set in line 3 and defined in lines 8-10) with a smoothing-factor of 11 (set in line 9). A factor of 11 leads to a reduction of spurious changes from 9/10 down to 1/2 for our data. Finally, incremental results are output to the console (set in line 5 and defined in line 11). - NOTE: the example above assumes that you have a global property "recognizer" with the value "recognizer", which is the standard case in configurations that come with sphinx. If you don't know what I'm talking about, then you can probably ignore this warning. - Compare the output depending on filter method and filter setting, and find your preferred setting. Also, try out the other HypothesisChangeListeners or even ... - write your own HypothesisChangeListener; please tell us if there is a chance of it being useful to others, so that we can include it in future versions. ________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ Acquiring an Example Application ________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ There is currently no simple example application. Sorry. There is, however, an adapted configuration file for the HelloNGram demo distributed with Sphinx-4. Just overwrite hellongram.config.xml in your sphinx distribution with the one in this directory, recompile the JAR and launch HelloNGram from the bin/ directory of sphinx-4 with java -cp /path/to/IncrASRFilter.jar -jar HelloNGram.jar You should see incremental output while you are speaking. If you use a fast computer, that is. Incremental output is no fun if ASR is running much slower than real-time. If you have trouble setting up the code from this package, please contact timo@ling.uni-potsdam.de for help. _______________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ License and Citation in Scientific Work _______________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ This is free and open source software, released under the General Public License 3. Please see the file COPYING for details. The license sets the requirements for using this software. The requirements in this section are not legally binding; however, they represent things that the author would appreciate: If you write a paper based on work that uses this software, please include a citation to one of our papers, and send a copy of your paper. If you create a derivative work based on a this software, tell me about it and let me know how I can get a copy, and include this request if it’s appropriate. The algorithms in this software are described in Timo Baumann, Michaela Atterer, and David Schlangen (2009). "Assessing and Improving the Performance of Speech Recognition for Incremental Systems". in Proceedings of NAACL-HLT 2009, Boulder, USA. http://www.aclweb.org/anthology/N/N09/N09-1043.pdf The file CITING contains the bibtex entry for your convenience. _______________________________________ ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ [1]: cmusphinx.sourceforge.net/sphinx4/