The artificial intelligence is present in our daily lives (even our jobs), because of the rapid development that is attractive and you can look for the user. The facial recognition algorithms are one of the tangible results of AI and one of the most popular.
Researchers at the Massachusetts Institute of Technology (MIT) developed an algorithm capable of virtually rebuilding people’s faces by simply analyzing their voice.
This algorithm is named after Speech2Face and has the ability to know the ethnicity of the person speaking, in addition to their age, gender, and certain facial features.
Speech2Face works thanks to a learning network created by the developers themselves, to obtain information from a database called AVSpecch. In this base, more than 100 thousand videos and audios of different people are housed, in fragments of a few seconds.
In its first phase of testing the videos contained in AVSpeech were analyzed. The images and the audios served to lay the foundations of the facial recognition of the algorithm, focusing on characteristics such as ethnicity, gender, the measurements of its skull and age.
To demonstrate the effectivity of Specch2Face, the developers made use of another database, but it has audios from thousands of interviews with show personalities. This database is called VoxCeleb.
The image that was generated from this test was that of a person with a neutral face and looking forward, and then be compared with the real image of the singer or the actor in turn.
Although it cannot be said that the results are an exact replica of the image of the person who owns the analyzed voice, there was a visible similarity and at the time of establishing the gender, the precision increased to 94%.
MIT researchers say they are still in tests to make the effectiveness of their algorithm greater.
While this technology is amazing, it makes us reflect on the security measures that must be implemented accordingly. Many devices have facial recognition as an access lock, and with this type of application this regard, the developers point out that their model cannot recover the true identity of a person from their voice (that is, an identical image of their face). It is only able to select visual characteristics that are common in many people and assign them according to the voice record obtained and previously analyzed by the algorithm., they can be violated.