Members | Sign In
Legacy MOVI User Community Forum (readonly) > Non-Arduino MOVI
avatar

MOVI without MOVI hardware

posted Jun 17, 2016 21:09:53 by Mark Haun
Hi Gerald and co,

We last conversed about a year ago (following your talk at ESC '15). It's good to see the MOVI project doing well. Which means it's a good time to revisit my earlier question...

If I understand correctly, the main thing keeping MOVI-style speech recognition and synthesis out of reach of most hobbyists is some combination of
(1) a packaging issue--the software is [mostly?] free but many packages must be integrated properly, and
(2) an IP issue--the right models, dictionaries, etc. come with a price tag.

What I would really like to see is MOVI (or its equivalent) as a software-only distribution that folks could run on an existing computer for integration into a home-automation system, VOIP, and so on. I understand that MOVI on RPi, for example, could be integrated into my existing network, but I already have a fast ARM machine (NVidia Jetson board) for mailserver/webserver/NAS/wifi-AP/openhab/freeswitch/etc duties. The speech UI belongs there too; besides, it means one less Linux machine to administer/update/worry about, always a good thing.

All of this just to make clear that I'm not trying to weasel out of paying $$ for a working system.

What do you see standing in the way of a SW-only solution for Linux users? Does it boil down to IP licensing? Can someone figure out a way to deliver the open-source stuff separately, then sell the magic binary blob for a reasonable price?

Even reasonable-quality speech synthesis (let alone speech recognition) is still effectively out of reach for 99+% of Linux users on the desktop, in the year 2016. This seems incredible to me, and I wonder what we could do to improve the situation. Obviously, MOVI is a great start, but it is not the whole answer either.

In the short term, is it feasible to "teleport" the executable environment from the MOVI onto another armhf architecture machine, and run it there? Does my purchase of MOVI give me a license to transfer the IP onto another machine?

Regards,
Mark
page   1
2 replies
avatar
GeraldFriedland said Jun 19, 2016 01:24:39
Mark,

Lots of good questions!

Apart from all the software integration and IP issues you mention, don't forget that MOVI also brings you a pre-made front end. Projects like Jasper (http://jasperproject.github.io/) have tried to do what you are proposing, even though completely open source. But apart from being very hard to install and then being only able to recognize single keywords (e.g. no distinction between "one two three" and "three two one") they also have a huge issue with the audio front end. See for example: https://github.com/jasperproject/jasper-client/issues/15 or https://github.com/jasperproject/jasper-client/issues/49.

So, ultimately, the only way to guarantee a good user experience with MOVI, is to package both hardware and software up into an integrated piece of hardware. Also, timing is an issue. Right now, nothing else runs on MOVI's CPU. If you have to share the load between different programs, things can get tricky in realtime.

Keep in mind that there are still companies out there that will charge hundreds of thousands of dollars to create, like a speech-recognition-based calling system. A task that you can do with MOVI in an hour of work.

I internally experimented with porting MOVI to other armhf CPUs and, yes, it works except for the issues mentioned above. So if you are interested in a larger scale solution where we create an OEM version or something like that for another board, let's talk. It's definitely possible.

With a standard purchase of on MOVI board, I am afraid, you cannot transfer the software to another board, for various reasons.

Gerald
avatar
Mark Haun said Jun 20, 2016 03:22:54
Hi Gerald,

What those Jasper issues tell me is that sound on Linux is a horrible mess (which it is). I was thinking microphone quality, minor differences in spectral response, etc., but no, the issues are clipping and the whole zoo of software problems with ALSA or pulseaudio. It would be great if someone finally wrote documentation for ALSA, but, for someone who has worked through the madness before, this is not a showstopper.

Packaging HW and SW is what Apple does to ensure a good user experience, and it works. But it also limits the impact you can have. The future will still belong to the highly centralized and controlled products like Echo, Siri/Cortana/Google, etc. and the Linux world will do without. Of course MOVI doesn't have to change the world in order to be successful, but it doesn't hurt to aim high :)

So, correct me if I'm wrong, but I don't see anything to dissuade me from my original assumption, which is that the *essential* difference between MOVI and projects like Jasper is in the licensed IP. This seems to me the strongest argument in favor of the HW/SW integrated solution: IP owners are happy to license for SW embedded into a physical object, but unwilling to risk licensing for a SW-only product which would be susceptible to unlimited copying. In other words, the classic market failure that dogs open-source projects... If all the users chipped in $1 the IP owners would probably get more revenue than they do now, but, how to ensure that without DRM which is anathema in that community?

Anyway, it would be helpful / educational to see a list of the non-free SW/data in MOVI. After all, you've credited all of the open-source packages, why not the other bits.

My wish for Audeme is that you make a healthy profit on MOVI, *then* find a way to get one of the large research institutions to release their speech databases and models into the public domain. I think that is the only hope if you want to really change the course of the future, long term.

Mark
Login below to reply: