Where does the money go?

Last night, I took a look at the federal budget for 2012-2013. Apparently we will be spending about 25% in “Servicing of Domestic Debt”. Take a more detailed look here

June 22, 2012

Urdu Sentiment Lexicon

With the increasing number of “opinion-dispensing apps” which enable Urdu users to write in Unicode out there on the web, there is (or will soon be) a need for getting some meaningful statistics out of the ever-present sentiment of the masses (or at least the web-savvy subset). This calls for resources which enable automatic processing of sentiment, one of which is a sentiment lexicon for Urdu. (For people uninitiated in computational linguistics, a lexicon is just a list of words). Since I couldn’t find any sentiment lexicon available for for Urdu on the tubes, I decided to put in some effort and create a new one. ...

June 14, 2012

LDA based topic modelling in javascript

Twitter API has changed a bit so this post had to be updated. Check the updated post here Topic modelling means detecting “abstract” topics from a collection of text documents. The most common text book technique to do that is using Latent Dirichlet Allocation. Simply put, LDA is a statistical algorithm which takes documents as input and produces a list of topics. One catch is that you have to tell it how many topics you want. There’s much more to it but since this is not a tutorial post, I will stop here. (If you are interested in how it works, read the references given on the wiki page.) ...

June 10, 2012

Twingual: A twitter client for bilingual tweeple

In my last post, I highlighted some problems that I face daily while using twitter in Urdu as well in English. A few days ago, I decided to experiment with the Twitter API and write my own client to fix some of these problems. You can see the result at www.twingual.com. It is a javascript only twitter client which supports neat Nastaleeq urdu fonts as well as transliteration. It’s a work in progress and does not implement all twitter features. If you like it and want to see something you need everyday implemented, feel free to send a tweet. ...

May 9, 2012

Nastaleeq Urdu Typesetting: When will they get it right?

Last night, I read about the new Nasteeq font available in Windows 8 and I just had to check it out. After leaving my machine up all night to install the consumer preview, I finally had time to examine the new “Urdu Typeset” out a while ago. Although Microsoft explicitly states it to be a ‘document’ font, it never hurts to check out how it behaves in a web UI setting. Here’s a screen shot of how the Twitter Urdu page would look with the font. I had to do some CSS overriding to get that right (body.ur for the curious). ...

April 14, 2012

A challenge in time-limited search

I was trying to trace the source of the quote “Any sufficiently advanced financial instrument is indistinguishable from fraud.”. If you do a quoted google search on a custom date range, an interesting problem can be seen. The results contain pages originally published in 2005 but re-indexed recently. While re-indexing, the current tweets of the author were visible to the crawler and got indexed along with the original article. This makes it seem like the quoted text was mentioned first in 2005 where as originally it’s only a recent meme. ...

November 23, 2010

Google as a Question Answering System

A Question Answering (QA) system is an Information Retrieval system which gives the answer to a question posed in natural language. For example, if you ask it Who wrote Hamlet?, it should answer Shakespeare. A few years ago (don’t ask me how many), search engines did not focus on language queries. Recently [sic], Google has started incorporating some NLP (Natural Language Processing) in their results. You can try it out by typing the same question in the search box yourself ( or clicking here ). ...

February 6, 2010

outwit

I am playing around with a customized twitter client, temporarily named ‘outwit’. I’ll try to add up the features as I need them but its strictly an experiment for the time being. Let’s see if things go smoothly from here.

June 26, 2009

Creating Canonical / Normalized Links and URLs

Here is a short intro on how to make sure that major search engines (Google, Yahoo, Microsoft) can be directed to see different URLs with the same content as a single ‘conanical’ URL. For example, the following links point to the same page but have different URLs http://nu.edu.pk/default.aspx http://nu.edu.pk http://www.nu.edu.pk/default.aspx http://www.nu.edu.pk/ http://nu.edu.pk/default.ASPX The solution is to select a single point as your representative URL and include this line in the HTML code. ...

February 16, 2009

Am I asking the right question?

Twitter asks ‘What are you doing?’ Google Latitude asks ‘Where are you?’ del.icio.us asks ‘What are you browsing?’ As in science, one of the most important things in web application development is thus asking the right question (at the right time). A properly phrased question can identify your niche along with giving you a guideline for keeping your focus on a single problem. As some people might notice, I tend NOT to write much anyway. So until the writer in me is back (if there _is_ one at all), I’ll try to write something even if its a one liner. ...

February 10, 2009