Because high-schoolers need computers…

For under $1 million, every high school student in Punjab can have access to a computer. Number of high schools = 5600 (source) Price per computer = 16500 PKR (source) Total = 92,400,000 PKR = 976,206 USD Imagine a whole generation growing up on Khan academy lectures and the Gutenberg library. Imagine these kids using Wikipedia to get both sides of an argument and playing around with Wolfram|Alpha. Imagine them falling in love with physics by appreciating the mysteries of light and getting high on chemistry by designing molecules. Imagine them learning how to pronounce the word “measure’ properly and hearing Faiz reciting poetry as it was meant to be recited. ...

July 2, 2012

Where does the money go?

Last night, I took a look at the federal budget for 2012-2013. Apparently we will be spending about 25% in “Servicing of Domestic Debt”. Take a more detailed look here

June 22, 2012

Urdu Sentiment Lexicon

With the increasing number of “opinion-dispensing apps” which enable Urdu users to write in Unicode out there on the web, there is (or will soon be) a need for getting some meaningful statistics out of the ever-present sentiment of the masses (or at least the web-savvy subset). This calls for resources which enable automatic processing of sentiment, one of which is a sentiment lexicon for Urdu. (For people uninitiated in computational linguistics, a lexicon is just a list of words). Since I couldn’t find any sentiment lexicon available for for Urdu on the tubes, I decided to put in some effort and create a new one. ...

June 14, 2012

LDA based topic modelling in javascript

Twitter API has changed a bit so this post had to be updated. Check the updated post here Topic modelling means detecting “abstract” topics from a collection of text documents. The most common text book technique to do that is using Latent Dirichlet Allocation. Simply put, LDA is a statistical algorithm which takes documents as input and produces a list of topics. One catch is that you have to tell it how many topics you want. There’s much more to it but since this is not a tutorial post, I will stop here. (If you are interested in how it works, read the references given on the wiki page.) ...

June 10, 2012

Twingual: A twitter client for bilingual tweeple

In my last post, I highlighted some problems that I face daily while using twitter in Urdu as well in English. A few days ago, I decided to experiment with the Twitter API and write my own client to fix some of these problems. You can see the result at www.twingual.com. It is a javascript only twitter client which supports neat Nastaleeq urdu fonts as well as transliteration. It’s a work in progress and does not implement all twitter features. If you like it and want to see something you need everyday implemented, feel free to send a tweet. ...

May 9, 2012

Making a copy of WEKA Instances

This ‘thing’ took about 30 minutes to figure out. According to the WEKA documentation, if you add a new Instance to an existing Instances object, String values are not transferred ! In case you are working on copying a dataset with a string attribute, you need to transfer the string manually. The code segment below copies the i^th instance from source to dest where the first attribute (at index 0) is a string attribute. ...

April 12, 2010

Online English to Urdu Translator

While all the online English to Urdu translators that I have seen don’t really work that well (read suck), if we make use the overlapping vocabulary and grammar of Hindi and Urdu along with using Google’s translation API, things come out pretty decent (as mentioned in my previous post). Here’s a small 15 min first cut script which just uses English to Hindi translation and then transliterates from Hindi to Urdu. Feel free to use the code and do ping me if you improve something. This works as a Hindi to Urdu transliterator as well. ...

January 23, 2010

What do you tweet about? : A shell script for getting most frequent words for twitter

There are a lot of web apps around which report your twitter stats. But at times, it’s better to do things yourself. I haven’t done any fun coding for ages now so last night, I finally got around to making a small program to gather twitter word statistics. The fun part was to do everything using unix tools. Here’s a small script file which displays the 10 most used words in the tweets for any twitter id. I have only tested it under cygwin so this is probably the best place to say “USE AT YOUR OWN RISK”. ...

December 18, 2009

JabRef and Google Scholar

I can’t seem to find any way to import the bib entries provided by google scholar to JabRef directly. You can enable the Import into BibTex link from the preferences but it streams the bib file as text/plain which opens up in the browser. You can save it and import it but that wastes a lot of clicks. The easiest option is to copy-paste all the text into a new JabRef entry (Ctrl+N). The default settings leave the double curly braces in the title (to preserve case) which can be removed by enabling the Remove double braces… checkbox in the File tab of Options/Preferences. This works for JabRef 2.5. ...

November 14, 2009

Are you interested in using computers in the classrooms?

 A friend of mine is carrying out research in classroom based e-assessment in developing countries such as Pakistan. The aim of the research is to assist primary school teachers with computer software that: · Is aligned with the particular subject curriculum they follow in their schools. · Provides pupils with challenges and interactive short quizzes and tests to take after completing a topic taught by the teacher in the classroom. ...

July 31, 2009