Building a Voice Database for Hindko Digits

PakistanWed Jan 15 2025
Advertisement
Ever wondered why Hindko, a language spoken by nearly eight million people in Pakistan, doesn't have a voice recognition system? Well, that's about to change thanks to a new public voice dataset. Hindko, mainly spoken in the Northwestern parts of Pakistan, is the seventh largest language in the country and the second largest in Khyber Pakhtunkhwa. Major regions like Hazara, Haripur, Abbotabad, and Mansehra have over 80% of the population speaking Hindko. From religion to theater, this language covers a wide range of topics in its spoken content. To enhance accessibility and preserve the language, a voice dataset has been created. This dataset includes 17, 597 voice samples, representing all 20 Hindko digits from 1 to 20. The samples were collected from students, staff, and faculty at the Pak-Austria Fachhochschule Institute of Applied Science and Technology. This comprehensive dataset is a step towards improving digital inclusion for Hindko speakers. Imagine saying the number 'ten' in Hindko and having your device understand you perfectly. That's the goal of this new dataset. Creating such large-scale datasets is crucial because they help train voice recognition systems. Without these datasets, voice recognition remains a distant dream for many languages, including Hindko. Between religion and politics, poetry and theater, Hindko has a rich oral tradition. Preserving this tradition in the digital age is important, and a voice recognition system can do just that. It not only makes technology more accessible but also ensures that future generations can engage with their language digitally.