Questions On The New Speech To Text

I have some questions about the new speech to text feature (the one using OpenAI Whisper). I know there is another thread on the issue, but it’s been hijacked by high emotions and extrapolation as to what is actually going on. @GaelDuval I’d also like to apologize for my part in that.

The main questions I have that haven’t been addressed thus far are these:

#1 Does an /e/ OS user’s voice ever get sent to OpenAI’s servers?
#2 If this feature is not used, does any data ever get sent to their servers, whether or not it has been anonymized?

Answers to these two questions would help to clear up a lot of people’s concerns about the feature. If the answer to both is “no” then I don’t see any privacy violation here.

And now for a little of what I think, depending on what the answers are:

If the answer to #1 is “yes” that makes this feature one I won’t ever use, but others are free to see it differently. It seems trivial for OpenAI to deanonomize the actual voice recording. I would ask that when using this feature for the first time, a warning is given explicitly stating that their voice will be shared with OpenAI.

For #2 it seems mostly obvious that this wouldn’t happen. But the question has been asked and it would be nice to have some explicit clarification. If the answer is “yes” then I feel it should be removed from /e/ OS completely. People don’t want any data going to OpenAI without their consent, even if anonymized.

The purpose of this thread is to ask questions and figure out what is actually happening. Please take any emotional responses to the other one. I fully support the removal of any posts that don’t comply with this request (at least in this specific thread).

5 Likes

I echo these sentiments. I would also like to know if there is a way to remove this feature if you don’t have a need for voice to text. In the mean time I have revoked the app’s permissions to everything on my phone

2 Likes

I shall get a response from the development team on this

5 Likes

I would also like to be able to do this. I don’t need voice to text and
would ideally like to be able to remove it rather than just disable it.

3 Likes

@Manoj Any updates on this? It’s been over a week.

The answers for this were shared by Gaël in his response a couple of days back.
The option to enable is with the user. The process of removing it using commands is not recommended as it can inadvertently break other applications.

To add to the answers already given we have even released a guide on how the speech to text app works.

The guide also includes the source code of the application. Those who understand code can check that and confirm for themselves how the code works.
/e/OS from its inception has been built for the not so technical user. We allow you to use all the social media apps on this OS.

Coming to the AI world. From the little that I have seen of it I can say that in the days to come it will be a huge part of our lives. I am absolutely sure that Android 16 and above will be closely integrated with AI tools and so will all the utilities and interfaces we come across in our daily lives. For all of us it will be a challenge to adapt to the integrated world. Jobs are going to be lost and the way we interact with the world will also be via AI tools. If we like or not is a totally different matter but AI is here to stay.

For /e/OS as mentioned in the response linked above we had to find the best way ahead to convert speech into text without compromising the quality of the service offered and we went ahead with the implementation of AI but by using an anonymization proxy in between so that the data of those who want to use the tool is safe.

The /e/OS developers ensured that your data in not compromised in any way. If in spite of these responses any user still feel that your data is compromised in some way, we welcome users to check the code and let us know how it can be improved.

Again we would like to repeat that the tool can only activated by the user. By default it is not enabled.

Forceful removal of any inbuilt application is possible but not advised and users who want to do so can do it at their own risk.

1 Like

I don’t want to be a jerk, but the FUTO Voice app in connection with the HeloBoard keyboard from F-Droid works completely offline and better than the AOSP keyboard with the Murena speech recognition.

This should be a benchmark for the developers.

And: If I should use Murena, then I would like to choose the language. That thing mixes up many slavic languages.

1 Like

@Manoj neither your post, nor the linked sources answer either of the questions this thread was created to ask.

Is there any way we can get a response on the specific two questions listed above? A yes or no answer will suffice for both questions. If I missed the answer somewhere, please point out the specific place I can find it.

I’m sure it’s possible to figure out the answers to these questions by reading through all the source code, but I’m not a developer and wouldn’t understand any of it.

The answer is No to both the questions

Thank you!

To clarify a bit more on the first one:

I assume some kind of voice recording needs to be given to the model, then the model converts that to text.

Since the processing is done in the cloud, wouldn’t that require the voice recording to be sent to the cloud? How can the model process speech that it’s never given? It’s the vocal data distorted somehow?

I’m no expert, but it seems to me like a proxy could only anonymize the metadata, but wouldn’t change any voice recordings. But those voice recordings need to actually make it to the model for anything to happen. The model exists on OpenAI’s servers, so I’m confused as to how you can make this happen without sharing voice recordings.

Maybe I’m just totally ignorant as to how any of this works. If I am, anyone please correct me.

Take a look at the FUTO Voice keyboard. You download the models to your phone. I believe, that Murena has the ability to use a “downloaded” model on their servers instead of rerouting it which would cost a lot of time.

I’ve seen the FUTO one before and it seems like it works quite well.

As far as the model being on Murena servers, Gaël already mentioned that it isn’t, albeit indirectly.

He also mentions earlier in the post that Murena is using some cool trickery to make the process of rerouting faster. Murena’s STT server is nothing more than a proxy that anonymizes some data before forwarding it to OpenAI.

Then at least one answer with “no” might not be correct.

I agree that like it or not AI is going to be more and more a part of our lives, however my comment is only with respect to Android.

I have a phone running stock Android as well as a phone running /e/os. The stock Android phone is on 15. On that phone I’m able to shut off pretty much all the AI junk. Assistant is off, Gemini is disabled, and all the “smart” settings like text prediction and text suggestions are off. So my stock Android phone is essentially devoid of any AI that can annoy me in the tasks I do commonly.

Unless Android 16 radically changes things in terms of user control, I don’t foresee a problem. I know 16 will obsolete the Google Assistant in favor of Gemini, but in 15 I can disable it. I’d expect a way to shut off or disable Gemini in 16 as well. But I don’t know, Google could take away all options. That’s typically what happens, more useless features to get in the way and less control.

In any case by the time Android 16 rolls around on my stock Android phone (it’s a Motorola and they’re always a year late to the party), I expect to have replaced that phone and be off OEM Android altogether. I doubt /e/os is going to force me to do anything AI for quite a while. It took this long just for VTT to get bundled into /e/os and I can shut that off. If /e/os decides to add other AI tools down the road, I’m sure I’ll be able to shut those off too.

Sorry, but this seems not to be true. At Murena it reads:

“This means that while your voice data is sent to OpenAI for transcription purposes, it is first anonymized through our proxy to protect your identity. The only data that reaches OpenAI is the anonymized audio content necessary for the transcription process. Also, this speech to text feature is only running while it’s been explicitly activated by the user by pressing the speech recognition key on the keyboard.”

So it seems to be a “Yes” for the first and a “No” for the second question. Or am I getting something wrong again?

In any case, the published answers are confusing and contradictory.

3 Likes

Thank you @Sebastian for finding a real answer to my questions!

2 Likes