Where Iqbal lived in Cambridge

2010 July 7

 

During his undergrad (if you can call it that), Iqbal read at Trinity College, Cambridge. By current definition of the phrase, he was a ‘mature student’. He stayed at 17 Portugal Place.At that time, the house might have been college-owned, but I can’t confirm that.

It’s a smallish house with a narrow street on one side and a wider one on the other. The wider street opens up in Jesus Green, a large ground. The house is a 5 minute  walk away from the river Cam. 

I wish I could go and live there for some time, just to check if creativity is influenced by proximity to greatness, even if it’s time-shifted. Without going into any more details, here are some pictures.

the street 

the sky

the corner

the house

the plaque

observing infinities

2010 July 2
by awais

 

infinities

Puedo escribir los versos más tristes esta noche…

2010 June 27
by awais

write the saddest lines

Puedo escribir los versos más tristes esta noche.

Escribir, por ejemplo: "La noche está estrellada,
y tiritan, azules, los astros, a lo lejos".

El viento de la noche gira en el cielo y canta.

 

The night wind

(P.S. Quoting Neruda doesn’t mean you are in love… Being in love doesn’t mean you should quote Neruda)

(P.P.S. The images go with the lines. I just took the pictures)

of birds and boulders

2010 June 21
by awais

stone at lion's yard

Checking xpollinate….

Making a copy of WEKA Instances

2010 April 13

imageThis ‘thing’ took about 30 minutes to figure out. According to the WEKA documentation, if  you add a new Instance to an existing Instances object, String values are not transferred ! In case you are working on copying a dataset with a string attribute, you need to transfer the string manually. The code segment below copies the i^th instance from source to dest where the first attribute (at index 0) is a string attribute.

1
2
3
dest.add(source.instance(i));
dest.instance(dest.numInstances()-1)
    .setValue(0,source.instance(i).toString(0));

This should come in handy for text classification using WEKA (and hopefully save your time).

Google and Urdu Stemming

2010 March 5
by awais

 

Is google (finally) stemming Urdu? The last time I checked, there were doing something like a transliteration based search but in the screenshot below, you can see that searching for the phrase ان پڑھ چٹا shows some stemming is being used. Does anyone know anything?  Oh, and while I’m on this topic, I would also like to know why is it called چٹا ان پڑھ ?

image

Google as a Question Answering System

2010 February 6
by awais

A Question Answering (QA) system is an Information Retrieval system which gives the answer to a question posed in natural language. For example, if you ask it Who wrote Hamlet?, it should answer Shakespeare. A few years ago (don’t ask me how many), search engines did not focus on language queries. Recently [sic], Google has started incorporating some NLP (Natural Language Processing) in their results. You can try it out by typing the same question in the search box yourself ( or clicking here ).

image

During my M.Phil. course, one of the tasks was to build a basic QA system and extend it however we liked. We used the TREC 8 dataset for evaluations. While building the system, I evaluated how current search engines (read Google) performed on this task. For this, I just queried the exact question and used the summaries of the top five results as answers. Evaluating at that time (2008), I got a Mean Reciprocal Rank (MRR) score of 0.212 over 198 questions. 156 questions had no answers found in top 5 responses.

This term, I am demonstrating for the same task. Demonstrators are usually PhD students who provide help and guidance to junior students. For pure geek fun and lack of better things to do while taking a break, I decided to quickly jolt down a JavaScript (read  JQuery ) based QA system. This time,  the resulting MRR score over 198 questions was 0.384 while only 79 questions had no answers found in top 5 responses.

The results show clearly that during the last two years, Google has significantly improved on answering NLP queries. In fact (IIRC), my baseline system back in 2008 (based on RMRS based matching of sentences from the top 100 documents returned by an IR system) could only achieve an MRR score of approximately 0.290, showing that the current results are much better than that baseline. I hope this decade sees some more developments/improvements in QA systems and I can ask a system What do you get if you multiply six by nine?

I’ve always said there was something fundamentally wrong with the universe. ~Arthur Dent

Visualizing Citation Networks

2010 February 4

aclnet

For techies: I’ve been working on citation networks lately. You can visualize such a network as a graph. In this graph, the nodes represent publications (papers,articles etc) and the edges represent citations between them. The graph above was produced using the GraphViz. The data is from the ACL Anthology Network which contains publications from the publicly available ACL Anthology.

For non-techies: Oooooo! pretty picture!

A Typical Day of Research (and why I hate Depth First Search )

2010 January 29
by awais

 

image

Online English to Urdu Translator

2010 January 24

While all the online English to Urdu translators that I have seen don’t really work that well (read suck), if we make use the overlapping vocabulary and grammar of Hindi and Urdu along with using Google’s translation API, things come out pretty decent (as mentioned in my previous post). Here’s a small 15 min first cut script which just uses English to Hindi translation and then transliterates from Hindi to Urdu. Feel free to use the code and do ping me if you improve something. This works as a Hindi to Urdu transliterator as well.



(Thanks to عزت مآب جناب آغا علی رضا قزلباش رحمتہ اللہ علیہ who graciously sent me his term report on Hindi to Urdu transliteration, from where I’ve copied (and modified) the character mapping.)