I've just started trying to program my MOVI to select various "diagnostic modes" in my Arduino project. So far I have found that, after receiving the call sign, MOVI is very "aggressive" about matching any further utterance to words trained in the "addSentence" commands in setup(). For example, with only two sentences trained: 1) "spark advance" and 2) "solenoids", MOVI will think I said "advance" when calling getResult() after a return code of RAW_WORDS from poll() if I say a word like "nothing" or "cancel", and the +'ve value I get from poll() is the sentence number for "spark advance". The false positive rate I am getting with this behaviour is undesirable.
I did try the "trick" in the WordSpotter example where one trains a "background model" with a bunch of sentences that contain frequently used English words, but then I get the opposite problem: even if I clearly say "spark advance", MOVI thinks I said some combination of the background model words instead. This only seems to serve to flip the problem on its head where I get almost everything coming back false negative.
I am working in a very quiet "lab" environment with virtually no background noise. I hope to move MOVI into my car where there is going to be a lot more ambient noise around, but for now, I think I can safely say that I'm not suffering from background noise issues. I am a native English speaker; if anything I would think my "Canadian articulation" (not maritime) should be about as optimal as it gets.
I have another post with respect to the use of an external microphone - this is unrelated as the behaviour I am describing here is observed with MOVI's built in microphone.
Is this expected behaviour from MOVI that it would very aggressively try to match utterances to trained words? Any advise on improving the strength of the phonetic matching? Would it be better if I put together some code sketch examples to evaluate?
[Last edited Apr 07, 2016 03:24:21]