LDA-Based Topic Modelling in Javascript: An Update

I’ve just pushed a Javascript version of LDA on my github account. It’s based on my no-longer-functioning earlier work. For testing, I use a subset of the SMS Spam Corpus available here (and thus take no responsibility of the inappropriateness of the text within 🙂 ). Each topic is represented as a word cloud; the […]

Urdu Sentiment Lexicon

With the increasing number of “opinion-dispensing apps” which enable Urdu users to write in Unicode out there on the web, there is (or will soon be) a need for getting some meaningful statistics out of the ever-present sentiment of the masses (or at least the web-savvy subset). This calls for resources which enable automatic processing […]

Twingual: A twitter client for bilingual tweeple

In my last post, I highlighted some problems that I face daily while using twitter in Urdu as well in English. A few days ago, I decided to experiment with the Twitter API and write my own client to fix some of these problems. You can see the result at www.twingual.com. It is a javascript […]

Nastaleeq Urdu Typesetting: When will they get it right?

Last night, I read about the new Nasteeq font available in Windows 8 and I just had to check it out. After leaving my machine up all night to install the consumer preview, I finally had time to examine the new “Urdu Typeset” out a while ago. Although Microsoft explicitly states it to be a […]