Hi @Xenium,
I installed Konele but I can’t get it to recognise English. This only appears to be available in the grammar mode, but even then I can’t get it to work. Does it only work in specific apps?
Everyone
Almond doesn’t have a STT engine, it needs it from another app. Thus far I have not tested it much.
Open Jarvis is intended for home automation running on a raspberry pi, so this app Android only works with that.
I also found SUSI.AI but didn’t use it yet because it needs a login. I don’t know if this includes a STT engine.
There is a DeepSpeech demo app which can be built for Android here:
I don’t know exactly what it makes because I’ve not tried it yet.
More generally, I found that STT engines for Android are very rare! To understand this I’ll explain what I’ve found.
First, we can break down the “voice assistant” into two parts:
- STT engine to convert voice to text
- AI “assistant” which interprets the resulting text to commands to execute on the phone or over the internet
We can have two separate apps for this and that may actually be an easier way to do it (also the STT engine can be used for voice typing as well).
There’s also a subtlety here: “Assistant” apps have a clear place to be set as default in the Android settings. However STT engines is not so clear, there are in fact three places:
- Voice Input setting - in default apps, assist app, you can set a default app for voice input. Konele provides this service but very surprisingly most voice keyboards and assistant apps do not
- Virtual keyboard setting - Apps which can do their own STT for typing but don’t make it available to other apps, e.g. Gboard
- Assistant app! Assistant apps which do their own STT but don’t make it available for other uses, e.g. Cortana and Alexa are like this.
So, for a true STT engine we need it to provide the voice input service (and others, this is well described in the Konele docs).
Most voice typing apps, including keyboards that can do STT, actually use Google STT engine; so if Google isn’t present then they don’t work.
Try to find a non-Google STT engine which you can use with other apps - they exist, but are almost never open source. E.g. Swift Keyboard, Dragon Dictation.
So far, overall the most complete option I found is Kõnele, but it needs a network connection and so far is only available in Estonian (unless you can get it work in English or provide an English language server).
It would be preferable to have an on-device STT engine, but these are quite demanding on resources. Apparently DeepSpeech with TensorFlow Light can run on an RPi4 so should be okay on most smartphones.
So is this voice assistant for /e/ on the roadmap, does anyone know?
Cheers