tl;dr I built a KWIC concordance of Iqbal’s Urdu Poetry - Check out iqbal.chaoticity.com !
In distributional semantics, just like in interpretation of law, a word is known by the company it keeps. Observing how a word appears in context of other words allows us to identify usage nuances and gain deeper insights into the meaning and connotations of that particular term.
Traditional word indexes do not show this context and thus miss out on the most remarkable ability of the human brain – recognising patterns. Which is perhaps why, in the late 50s, Hans Peter Luhn (the father of hash codes!) came up with the idea of creating such an index for technical documents that allowed users to see keywords in their original context, facilitating efficient information retrieval. This innovative approach marked a significant development in the field of text processing and laid the foundation for subsequent advancements in concordance-based analysis. He called this concept Keyword in Context (KWIC – pronounced quick) and it caught on … quickly.
You can probably find tens of concordance generators online, for instance, ptx
is bundled with most Linux distros these days (try echo "All's well that ends well" | ptx
). A concordance generator (concordancer?) specific for Urdu poetry is however a slightly different story. There have been multiple very detailed (sometime manual!) works for Ghalib and Iqbal, mostly published as books, for example, one hosted by Iqbal Academy has been available for quite sometime. However, none of the existing work uses KWIC formatting and consequently overlook an effective tool for linguistic analysis and facilitating a deeper comprehension of language patterns.
So I built one! And it’s open source!
Please give it a go and post any issues on the github repo. It’s work in progress and I would love some curation love on the source text. PRs welcome!