Generalists vs. Specialists: Evaluating Large Language Models for Urdu

Let’s say you’re feeling a bit under the weather. You’ve got a scratchy throat, a nose that resembles a leaky tap, and a cough that rivals a 2-stroke rickshaw without a silencer. You start by calling up your GP (general practitioner) who, with their broad training of common ailments, reassures you that it’s probably just a mild cold – nothing that a little rest, some tea, and binge-watching a few seasons of your favourite series can’t fix. ...

September 26, 2024

UQA - Corpus for Urdu Question Answering

I think it was around 1999 when I first heard that Urdu is a low resource language. 25 years later Urdu is still considered low-resource despite having over 70 million native speakers. This is because large manually curated linguistic resources required for training are still not available for Urdu. One way around this barrier is to translate existing corpora from a resource-rich language (cough English cough) to Urdu. This seems like a chicken-and-egg problem though since good automatic translation systems require training resources. However, recent multimodal multitask models have reached a level where we can use their translations to set a baseline for more complex tasks like question answering. And we have done exactly that! ...

May 24, 2024