Urdu Sentiment Lexicon

With the increasing number of “opinion-dispensing apps” which enable Urdu users to write in Unicode out there on the web, there is (or will soon be) a need for getting some meaningful statistics out of the ever-present sentiment of the masses (or at least the web-savvy subset). This calls for resources which enable automatic processing […]

LDA based topic modelling in javascript

Twitter API has changed a bit so this post had to be updated. Check the updated post here Topic modelling means detecting “abstract” topics from a collection of text documents. The most common text book technique to do that is using Latent Dirichlet Allocation. Simply put, LDA is a statistical algorithm which takes documents as […]

Making a copy of WEKA Instances

This ‘thing’ took about 30 minutes to figure out. According to the WEKA documentation, if  you add a new Instance to an existing Instances object, String values are not transferred ! In case you are working on copying a dataset with a string attribute, you need to transfer the string manually. The code segment below […]