Prasanta Kumar Ghosh, Associate Professor at the Indian Institute of Sciences (IISc), Bengaluru, has developed several patented voice technology using Artificial Intelligence (AI), Machine Learning (ML), and Augmented Reality (AR). But his love for science and technology developed early on when he was in school.
“As my father days had struggled significantly to get the right education, he made sure that I was well educated and had the right tutoring and mentorship. But what really excited me in my high school days in 1996 was electronics and the work that ISRO was doing then,” Prasanta tells YourStory.
His father was a government employee and his mother was a homemaker. For Prasanta, the aim was to get a job immediately after graduation. Despite his love for the work ISRO was doing at that time, he, however, couldn’t take up the organisation’s offer as he was already working in a different organisation.
“After graduating in electrical engineering from Jadavpur University in 2003, it was exceedingly important for us that I find a job. I had started applying everywhere I could, which gave me job offers from different places, and I started working at Usha Comp Private Limited in Kolkata,” he narrates.
The world of research
However, he was never interested in taking up a job and rather wanted to research and work on electrical engineering and newer technologies. “I explained to my father that quitting the job may seem like a tough call, but in the long run, it would pay more dividends,” explains Prasanta.
Thereafter, he attempted the IISc entrance exam to pursue post-graduation studies.
“My rank was 489, thus I missed out on a lot of IITs and even IISc. My friends joined IISc and they kept telling me that there was a vacancy for a research position at the institute. I cleared the exam in 2004, and then was selected for the programme,” he says.
It was during this time that he got an offer from ISRO. While pursuing his MSc at IISc and simultaneously working there, he realised he could build and work on solving significantly larger problems.
“The faculty members at IISc were nothing short of inspirational. Their style of teaching and the way they inspired people to pursue research made me fall in love with the field, and I decided to take on the role of a researcher,” adds Prasanta.
This meant putting in a lot of hard work academically. Despite having a job offer from a startup, Prasanta decided to continue on the academic route. He went on to become a research intern at Microsoft Research India where he focused on the area of audio-visual speaker verification in 2006.
Speech compression
“I focused my research and work on speech compression. When you speak on a phone today and record the conversation, the voice is transmitted to your friend after compressing the audio. My work is around non-uniform sampling-based compression. Any waveform can be sampled across three key locations. You don't have to look at the whole signal or all the samples, but the key locations are compressed and reconstructed,” explains Prasanta.
He went on to publish his research which fetched him a thesis award. It also made him realise that he could do more work in the space. “It got me to look at the other options in the field of speech and I started looking at places in the US where I could do my PhD,” he adds.
He went on to receive his PhD in electrical engineering from the University of Southern California (USC), Los Angeles in 2011. It was there he learned how different interdisciplinary work can be done.
Multipdisiplinary approach
“I worked at the intersection of science and technology. I worked with linguists, engineers, mathematicians, and others to build speech-recognition technology. I understood how an FM (frequency modulation) transmitter generates signals and this was the base for understanding how human speech worked,” explains Prasanta.
He also had experience working on a special electromagnetic programme at USC that would record and track the motion of the lips and tongues, and jaw movements while speaking. This further led to building the different voice recognition modules.
“I had this idea when I was at LA which has a large Hispanic population that preferred speaking Spanish than English. I had a project where the speech of the doctor, which was in English, could be translated to Spanish so that the patient could understand them,” he explains.
During 2011-2012, Prasanta was with IBM India Research Lab (IRL) as a researcher. He was also awarded the INSPIRE Faculty Fellowship from the Department of Science and Technology (DST), Government of India in 2012.
“At IBM I worked more around intent classification in speech. For example, if someone asks ‘should I carry an umbrella tomorrow’, they actually want to know the weather for tomorrow,” says Prasanta. He also worked on text analytics and its intent.
He again joined IISc after his stint at IBM. After having worked on speech recognition, the next level was working on audio-visual speech recognition.
“We speak with gestures, and it is important to understand how gestures can create realistic animation. We have an optitrack motion camera device that can record someone’s gestures when they speak, which can help in understanding speech behaviour,” he explains.
Working on healthcare
Prasanta has also worked with hospitals like NIMHANS, St Johns, Bengaluru, etc. “Using the sound of your voice, we can, for example, try and understand how much the lung is congested. With HCG Hospital, we are trying to understand if you have a problem with your voice box. Many cancer patients have lost their voice box; we are trying to convert their speech into natural speech. Other than that, we are working to detect and improve the condition of patients with neurological problems who have a problem in speaking,” says Prasanta.
Now, he is working on speech recognition and voice technology using AI, ML, and AR. It offers the promise of improving livelihoods, especially in rural parts of India.
However, while India is home to 22 official scheduled languages, and a total of 6,661 mother tongues, leading internet companies in India are currently focusing only on five or six Indian languages.
Although the market is still nascent, the lack of investment in local languages and dialects is one of the fundamental bottlenecks for the growth of voice technology in the country. Prasanta’s project aims at addressing this bottleneck by reaching out to the wider Indian language base and laying the foundation to make it beneficial for the masses.
Advising young techies, Prasanta says, “Find out what you are really passionate about and focus on that. Once you decide to go all out to build and work on your project, find the support of the right people. Anything you do today needs multiple people coming together, and then everything will fall in place.”
Edited by Kanishk Singh