History
Profile
Board of Directors
Board Sub-Committee
Executive Committee
Management Team
Finance & Audit Committee
CEO message
Working Groups
Contact Us
 

'SAMENA Daily' News

Pakistani researchers set to build Urdu speech recognition system
Category: South Asia
Publishing Date: January 23, 2017
 

The possibility of some software application developer coming up with an Urdu speech recognition program just got more likely as the most fundamental tool needed for it has just been developed at Lahore’s Information Technology University.

Linguistic technology expert Dr Agha Ali Raza and his team at ITU’s Center for Speech and Language Technologies (CSaLT) laboratory has released for public use a corpus of Urdu sentences that covers all possible distinct sounds, called phoneme by linguists, used in everyday speech.

This corpus comprising 708 sentences that covers all 63 phonemes will soon be available for download at the C-SALT website.

Those interested in developing an Urdu speech recognition software will now have access to the most basic ingredient needed for the purpose.

They will just need a repository of words used in everyday speech to proceed with developing the application, says Dr. Raza.

“Speech recognition is a two-step process. The corpus will give the computer application access to all possible phonemes used in formation of meaningful Urdu words from everyday speech,” he says.

Though there are 63 distinct phonemes in Urdu, in everyday speech these don’t correspond to 63 distinct sounds. Dr. Raza explains that sound made for a phoneme may vary from one utterance to another depending on the phoneme used before and after it in a word.

Thus, he says, for every phoneme x, there will be 63x63 possible (tri-phoneme) sounds. The corpus of sentences covers for all these possible sounds.

In the first step, words from the corpus will allow the application to train itself in the sounds of various Urdu words.

The separate repository of words will come into play in the second stage allowing the application to choose the most appropriate words for the output sentences.

“This will enhance accuracy of the software,” Dr. Raza says.

Read more: Data Science Lab in Pakistan Makes Urdu-Hindi Dictionary

Thus, the accuracy of the speech recognition softwares depends on written or oral sources from where words and sentences are generated for the corpus and the repository maintained separately for ruling out meaningless words.

 
Source: http://www.dawn.com/news/1310242/pakistani-researchers-lay-the-foundation-for-building-urdu-speech-recognition-system
 
Home
About Us
Members Resources
Contact Us
 
Emerging Technologies
Executive Profiles
Press Releases
News Publications
 
Whitepapers
Working Groups
Telecom Directory
Regulatory Information
 
15 + 5 = ?