Members | Sign In
Legacy MOVI User Community Forum (readonly) > MOVI Question & Answers
avatar

Recommendations on improving accuracy?

posted Mar 31, 2016 01:45:10 by dmworking247
Firstly, I've been making some good progress with Movi and I'm generally impressed at how well it recognizes sentences spoken in a natural voice. However I am struggling with its accuracy when re-using key words in sentences.

Example:
Sentence 1: "Turn everything on" is recognized every time.
However, if I add "Turn everything off" as an additional sentence, Movi will usually respond to the first.

Instead, if I use "Shut it all down", Movi will almost 100% of the time recognize 'turn everything on' and 'shut it all down' correctly. Change this to "Shut EVERYTHING down" and Movi can't tell the difference betwee:
TURN everything ON
and
SHUT everything DOWN

Movi seems to be prioritizing its match based on the multiple syllable word "everything".

Any advice on how to phrase commands without having to make them to obscure and contain words that are totally exclusive from any other sentence?
page   1
3 replies
avatar
GeraldFriedland said Mar 31, 2016 02:52:23
That's a very good question. I am happy somebody posts this, so I can answer it.

First of all: Speech recognition is not a completely solved problem. Sometimes stuff just doesn't work and there is no explanation for it. More on this in our user manual, Chapter 4.

Having said that, there are a couple things you can do. MOVI's recognition works on the phoneme level. That means if you define "Turn everything on" and "Turn everything off" as sentences, the only difference between the two are "n" and "f". This makes the task hard, especially since these are two consonants. This is also the reason why full sentence recognition usually works better than keywords and why our standard example is "Let there be light" and "Go dark".

Regarding: "Turn everything on" vs "Shut everything down". "Turn" and "Shut" and "on" and "down" are very short words. I am pretty sure, MOVI doesn't fail 100% of the time but in the wrong environment this can be hard. So here are two things you can do:
1) In general try to make sentences differ as much as you can in every syllable. If "shut it all down" vs "turn everything on" works for you. Just do that.
2) If you do want to fine tune, I suggest to take a look at the Serial Monitor while MOVI is running. MOVI outputs the individual words that have been recognized BEFORE MOVI tries to match them to a sentence number. It looks like this:
MOVIEvent[200]: CALLSIGN DETECTED
MOVIEvent[140]: ACTIVELISTEN
MOVIEvent[141]: END ACTIVELISTEN
MOVIEvent[201]: SHUT EVERYTHING ON
MOVIEvent[202]: #1

Seeing were the ambiguities are should help you vary the sentences.

An additional note: I tried your two 'everything' sentences and MOVI just works fine for me. I suspect you have some echo and/or some additional noise in the room? If so, I suggest playing with the setThreshold() parameter as well as using a headset microphone (at least for debugging).

Ultimately, as I said, speech recognition is not straightforward. It makes it even more of a fun challenge to tinker with... ;-)

I hope that helped!
[Last edited Mar 31, 2016 02:54:34]
avatar
Bertrand said Mar 31, 2016 04:21:34
So here is another trick I have been using when using MOVI to control a light which is to use context to decide what to do.
Since "turn the light on" and "turn the light off" is so close - but easy to remember! - , I keep a variable in my program that is the state of the lamp.
Since it is unlikely from a use case that someone would want to turn on the light when it is already on and vice-versa, whenever MOVI recognize "turn the light on" OR "turn the light off", I look at the current state of the lamp and change its condition - if the lamp was on I turn it off and vice versa.
Using context will greatly improve the behavior of your application.
avatar
dmworking247 said Mar 31, 2016 05:49:06
Thanks for the feedback, I will experiment with all of those.

The state-driven-context is a particularly important point. In these cases the sentence could focus on "turn the playstation" instead of the "on" and "off" and simply reverse the current known state.
Login below to reply: