Monthly Archives: June 2012
Where does the money go?
Last night, I took a look at the federal budget for 2012-2013. Apparently we will be spending about 25% in “Servicing of Domestic Debt”.
Urdu Sentiment Lexicon
With the increasing number of “opinion-dispensing apps” which enable Urdu users to write in Unicode out there on the web, there is (or will soon be) a need for getting some meaningful statistics out of the ever-present sentiment of the masses (or at least the web-savvy subset). This calls for resources which enable automatic processing of sentiment, one of which is a sentiment lexicon for Urdu. (For people uninitiated in computational linguistics, a lexicon is just a list of words). Since I couldn’t find any sentiment lexicon available for for Urdu on the tubes, I decided to put in some effort and create a new one.
Click here to check out the Urdu Sentiment Lexicon
The Urdu Sentiment Lexicon is a list of 2,607 positive and 4,728 negative sentiment/opinion words for Urdu. It is based on a similar list for English available here. The English words have been translated to Urdu automatically using a dictionary lookup. All resulting Urdu synonyms have been included as well. The lexicon has also been manually inspected (but very quickly) and any irrelevant words have been deleted.
To test things out, I’ve also developed a simple javascript application which changes the color of the sentiment words according to their polarity. It also calculates the background color of the whole text using the total polarity score of the text (+1 for positive, –1 for negative). A screentshot is given above. Like all sentiment lexica, this won’t be perfect. But I am hoping this will give my fellow researchers who work on Urdu sentiment analysis a starting point and save some time.
LDA based topic modelling in javascript
Topic modelling means detecting “abstract” topics from a collection of text documents. The most common text book technique to do that is using Latent Dirichlet Allocation. Simply put, LDA is a statistical algorithm which takes documents as input and produces a list of topics. One catch is that you have to tell it how many topics you want. There’s much more to it but since this is not a tutorial post, I will stop here. (If you are interested in how it works, read the references given on the wiki page.)
I was playing around with tweets and topics yesterday. Unfortunately, I couldn’t find any javascript based LDA implementation. So I wrote one. Or to be more accurate, I converted an existing simple one-class implementation to javascript. To check how it works on real data, I need a tool with some documents. So I wrote that too.
Here’s twopicate, the output of about half a weekend of intermittent coding. You enter a search term, tell it how many topics you want, and press the button. It pulls tweets about that term from twitter and extracts topics for them. Each topic is represented as a word cloud (visible on the right). The larger a word, the more weight it has in the topic. The source tweets are on the left. Each tweet has a bar which shows the percentage distribution of topics for that tweet. You can try it yourself by clicking below.
Since it’s a javascript only solution, it runs in your browser and is consequently a bit slow. You might have to wait a minute after pressing the button.
Oh, and you can use the source.
A history of عوام دوست budget
2006 وزیر اعظم کےمشر برائےخزانہ ڈاکٹر سلمان شاہ نےکہا ہےکہ بجٹ 2006-07ء عوام دوست ہو گا
http://www.millat.com/news.php?id=1107
2007 آئندہ وفاقی اور صوبائی بجٹ عوام دوست بنائے جائیں۔وزیراعظم شوکت عزیز کی ہدایت
http://daily.urdupoint.com/todayNewsLive.php?news_id=30651&featured=1&cat_id=2
2008 امید ہے حکومت عوام دوست بجٹ پیش کرے گی، صدر پرویز مشرف
http://www.urdupoint.com/budget_2008/News20-7-61-66430.html
2009 متوازن اور عوام دوست بجٹ پیش کیا گیا‘تنویر کائرہ
http://www.jasarat.com/unicode/detail.php?category=3&newsid=18045
2010 نئے مالی سال کا بجٹ تاریخی اور عوام دوست ہے، پی پی پی
http://daily.urdupoint.com/featured_130124_1_2.html
2011 آئندہ بجٹ عوام دوست ہونا چاہئے، صدر ذرداری
http://express.com.pk/epaper/PoPupwindow.aspx?newsID=1101254876&Issue=NP_LHE&Date=20110602
2012 بجٹ عوام دوست ہو گا، صدر زرداری
http://www.bbc.co.uk/urdu/pakistan/2012/05/120531_zardari_budget_rwa.shtml
؏ پھر اگلی رُت کی فکر کرو



Recent Comments