A big privacy problem when using smart assistants like Alexa and Siri is the specific voice pattern which is recorded. This voice pattern can help to identify persons by audio recordings.
There is a german university project called Emonymous which attempts to anonymize such speech input already on the input device. It works like an audio filter keeping emotional information and content while reducing all other voice attributes which supports identification.
Maybe the /e/ Foundation can incorporate future results of the university project into the Elivia smart assistant to implement an extra layer of data privacy.
DeepL translation of the german project description:
Interactive intelligent voice assistants are conquering the home. The goal of the project is to completely anonymize the speaker identity without losing emotional and speech content information.
The Emonymous project (long title:" Speaker anonymization while preserving emotional expressiveness") is concerned with the research and development of new methods to completely anonymize the speaker’s identity when using intelligent voice assistants while preserving linguistic as well as emotional information to the greatest extent possible. This enables the use of interactive and intelligent voice assistants even in privacy-sensitive areas such as the health sector or learning support. The core of the interdisciplinary work is the development of a novel speech synthesizer and a perceptual similarity measure. The speech synthesizer should be able to generate high-quality speech recordings from the original speech data, which preserve the important speech information and disguise the identity of the speaker. For this purpose, already existing modules for instrument modeling (e.g. Differential Digital Signal Processing) are to be partially adapted and supplemented by further speech-specific modules within the scope of the project. A new differentiable similarity measure to be developed, which measures the difference between input and reconstruction of the speech data, will enable the training of the aforementioned AI methods, which can perform anonymization of the speaker and simultaneously check whether both speech information and emotionality have been preserved.
- Otto von Guericke University Magdeburg, Department of Mobile Dialogue Systems 39106 Magdeburg Germany.
- DFKI, Department of Speech and Language Technology 10559 Berlin Germany.
Duration: 08/01/2021 to 07/31/2023
Translated with www.DeepL.com/Translator (free version)