DependenSee: A Dependency Parse Visualisation/Visualization Tool

There aren’t many tools which allow you to visualise sentences parsed with dependency grammars. Here’s a small tool which generates a PNG of the dependency graph of a given sentence using the Stanford Parser. How to run: Dependency graph shown in the image above for Einey’s quote can be generated by following these steps. Click here to download <dependensee-3.7.0.jar>. Download the latest version of the Stanford Parser. I am using version 3....

August 28, 2010

Making a copy of WEKA Instances

This ‘thing’ took about 30 minutes to figure out. According to the WEKA documentation, if you add a new Instance to an existing Instances object, String values are not transferred ! In case you are working on copying a dataset with a string attribute, you need to transfer the string manually. The code segment below copies the i^th instance from source to dest where the first attribute (at index 0) is a string attribute....

April 12, 2010

Google and Urdu Stemming

Is google (finally) stemming Urdu? The last time I checked, there were doing something like a transliteration based search but in the screenshot below, you can see that searching for the phrase ان پڑھ چٹا shows some stemming is being used. Does anyone know anything? Oh, and while I’m on this topic, I would also like to know why is it called چٹا ان پڑھ ?

March 5, 2010

Visualizing Citation Networks

For techies: I’ve been working on citation networks lately. You can visualize such a network as a graph. In this graph, the nodes represent publications (papers,articles etc) and the edges represent citations between them. The graph above was produced using the GraphViz. The data is from the ACL Anthology Network which contains publications from the publicly available ACL Anthology. For non-techies: Oooooo! pretty picture!

February 4, 2010