Urdu Sentiment Lexicon

With the increasing number of “opinion-dispensing apps” which enable Urdu users to write in Unicode out there on the web, there is (or will soon be) a need for getting some meaningful statistics out of the ever-present sentiment of the masses (or at least the web-savvy subset). This calls for resources which enable automatic processing […]

LDA based topic modelling in javascript

Twitter API has changed a bit so this post had to be updated. Check the updated post here Topic modelling means detecting “abstract” topics from a collection of text documents. The most common text book technique to do that is using Latent Dirichlet Allocation. Simply put, LDA is a statistical algorithm which takes documents as […]

Making a copy of WEKA Instances

This ‘thing’ took about 30 minutes to figure out. According to the WEKA documentation, if  you add a new Instance to an existing Instances object, String values are not transferred ! In case you are working on copying a dataset with a string attribute, you need to transfer the string manually. The code segment below […]

Google as a Question Answering System

A Question Answering (QA) system is an Information Retrieval system which gives the answer to a question posed in natural language. For example, if you ask it Who wrote Hamlet?, it should answer Shakespeare. A few years ago (don’t ask me how many), search engines did not focus on language queries. Recently [sic], Google has […]