<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Chaoticity &#187; Language</title>
	<atom:link href="http://chaoticity.com/category/language/feed/" rel="self" type="application/rss+xml" />
	<link>http://chaoticity.com</link>
	<description>a state of things in which chance is supreme</description>
	<lastBuildDate>Mon, 23 Jan 2012 20:05:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>The Floor Code</title>
		<link>http://chaoticity.com/the-floor-code/</link>
		<comments>http://chaoticity.com/the-floor-code/#comments</comments>
		<pubDate>Sun, 10 Jul 2011 08:53:16 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[Art]]></category>
		<category><![CDATA[chaos]]></category>
		<category><![CDATA[Language]]></category>

		<guid isPermaLink="false">http://chaoticity.com/the-floor-code/</guid>
		<description><![CDATA[If you walk into my department, one of the first things you may notice is that some of the tiles on the floor are a black and there’s no particular pattern to it. These tiles actually encode a message. The curious amongst us are supposed to decode this but despite having spent 3 years in [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fthe-floor-code%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="The Floor Code" data-url="http://chaoticity.com/the-floor-code/"></a> 
			</div></div>
		<div style="clear:both;"></div><p><a href="http://chaoticity.com/images/DSCF7495.jpg" target="_blank"><img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="DSCF7495" border="0" alt="Computer Lab Floor" src="http://chaoticity.com/images/DSCF7495_thumb.jpg" width="570" height="336"></a></p>
<p>If you walk into <a href="http://www.cl.cam.ac.uk" target="_blank">my department</a>, one of the first things you may notice is that some of the tiles on the floor are a black and there’s no particular pattern to it. These tiles actually encode a message. The curious amongst us are supposed to decode this but despite having spent 3 years in the department, I could never get the time until last Friday. The decoding should be pretty simple if you want to try your skills. The last 6 letters of the first word can be read off this picture. If you are too lazy, just <a href="#" onclick="showcode();">click here</a> for the explanation. (Anyone who has taken an Introduction to Computer Science course should at least try for ONE minute before clicking)</p>
<p style="display:none" id="floorcode">The code says : Computer LAboratory — AD 2001 — ☻. If you look closely, the tiles are in the form of squares and some of them are split in the middle such that the left half is always black and the right one is always white. Take each black-white combo as a 1 and the white-white combo as a 0. It’s a straight forward UTF encoding after that. Foe example, the row closest to the bottom of the pic is WWBWBWWWWWWWWWBW, where W is white an B is black. Taking WW = 0 and BW = 1, the code becomes 01100001 i.e. 97 in decimal and its ASCII equivalent is ‘a’.&nbsp; </p>
<p>Geek Art!</p>
<p>(Oh! and clicking on the image opens up a high res version.) </p>
<p><script> var fcode=false;function showcode(){document.getElementById('floorcode').style.display= (fcode?'none':'block');fcode=!fcode;}</script></p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/the-floor-code/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How to change font on the BBC Urdu website</title>
		<link>http://chaoticity.com/how-to-change-font-on-the-bbc-urdu-website/</link>
		<comments>http://chaoticity.com/how-to-change-font-on-the-bbc-urdu-website/#comments</comments>
		<pubDate>Tue, 17 May 2011 23:50:14 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[chaos]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Urdu]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://chaoticity.com/how-to-change-font-on-the-bbc-urdu-website/</guid>
		<description><![CDATA[Let’s face it. The font on the BBC Urdu website sucks. When a friend complained about it on our alumni list, I thought of writing a small greasemonkey script to take&#160; care of the problem. The results are pretty good, as visible in the image below. The left part is the site after installing the [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fhow-to-change-font-on-the-bbc-urdu-website%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="How to change font on the BBC Urdu website" data-url="http://chaoticity.com/how-to-change-font-on-the-bbc-urdu-website/"></a> 
			</div></div>
		<div style="clear:both;"></div><p align="justify">Let’s face it. The font on the <a href="http://www.bbc.co.uk/urdu/" target="_blank">BBC Urdu website</a> sucks. When a friend complained about it on our alumni list, I thought of writing a small greasemonkey script to take&#160; care of the problem. The results are pretty good, as visible in the image below. The left part is the site after installing the Urdu Naskh Asia type font provided by BBC and before installing the script (and I maintain, Aijaz, it sucks). The right part is after installing the script. Click on the image and you’ll get an un-scaled version. </p>
<p align="justify">To install the script, click the link below and follow the installation instructions given there. Currently, it works only on Chrome and Firefox.</p>
<p align="center"><a href="http://userscripts.org/scripts/show/103069" target="_blank">BBC Urdu Font Changer @ userscripts.org</a></p>
<p>and the world is a bit better now…</p>
<p><a href="http://chaoticity.com/images/change1.png" target="_blank"><img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="change" border="0" alt="change" src="http://chaoticity.com/images/change_thumb1.png" width="600" height="313" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/how-to-change-font-on-the-bbc-urdu-website/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>DependenSee: A Dependency Parse Visualisation/Visualization Tool</title>
		<link>http://chaoticity.com/dependensee-a-dependency-parse-visualisation-tool/</link>
		<comments>http://chaoticity.com/dependensee-a-dependency-parse-visualisation-tool/#comments</comments>
		<pubDate>Sat, 28 Aug 2010 18:36:21 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Dependency]]></category>
		<category><![CDATA[Graph]]></category>
		<category><![CDATA[Parser]]></category>
		<category><![CDATA[Stanford]]></category>
		<category><![CDATA[Tree]]></category>
		<category><![CDATA[Typed]]></category>
		<category><![CDATA[Visualisation]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://chaoticity.com/dependensee-a-dependency-parse-visualisation-tool/</guid>
		<description><![CDATA[&#160; There aren’t many tools which allow you to visualise sentences parsed with dependency grammars. Here’s a small tool which generates a PNG of the dependency graph of a given sentence using the Stanford Parser. You can generate the image for Einey’s quote below by following these steps. Click here to download DependenSee.jar. Download the [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fdependensee-a-dependency-parse-visualisation-tool%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="DependenSee: A Dependency Parse Visualisation/Visualization Tool" data-url="http://chaoticity.com/dependensee-a-dependency-parse-visualisation-tool/"></a> 
			</div></div>
		<div style="clear:both;"></div><p>&nbsp;</p>
<p>There aren’t many tools which allow you to visualise sentences parsed with <a href="http://en.wikipedia.org/wiki/Dependency_grammar" target="_blank">dependency grammars</a>. Here’s a small tool which generates a PNG of the dependency graph of a given sentence using the <a href="http://nlp.stanford.edu/software/lex-parser.shtml" target="_blank">Stanford Parser</a>. You can generate the image for Einey’s quote below by following these steps.</p>
<p><a href="http://chaoticity.com/images/out.png"><img style="display: inline; border: 0px;" title="out" src="http://chaoticity.com/images/out_thumb.png" border="0" alt="out" width="596" height="212" /></a></p>
<ol>
<li>Click here to download <code><a href="http://chaoticity.com/software/DependenSee.jar" target="_blank">DependenSee.jar</a></code>.</li>
<li>Download the <a href="http://nlp.stanford.edu/software/lex-parser.shtml#Download" target="_blank">latest version of the Stanford Parser</a>.  I am using version 1.6.9. (For 1.6.7, use <code><a href="http://chaoticity.com/software/DependenSee.1.6.7.jar" target="_blank">DependenSee.1.6.7.jar</a></code> and for versions &lt;1.6.6, use <code><a href="http://chaoticity.com/software/DependenSeeOld.jar" target="_blank">DependenSeeOld.jar</a></code> everywhere. )</li>
<li>Extract <code>stanford-parser.jar</code> and <code>englishPCFG.ser.gz</code> in the same folder as <code>DependenSee.jar</code>.</li>
<li>On the command prompt, run <code>java -cp DependenSee.jar;stanford-parser.jar com.chaoticity.dependensee.Main "Example isn't another way to teach, it is the only way to teach." out.png</code> (If you are on *nix, replace the semicolon by a colon and make sure you have Arial installed. If you have an already parsed dependency output file, replace the sentence by <code>-t input.txt</code> .)</li>
<li>Open <code>out.png</code> and admire :)</li>
</ol>
<p>I have added Part-of-Speech tags and very basic edge overlap management and might add more eye candy later (curved/coloured edges ?). You can link the library in your code as well. An example is given below. Comments and queries are welcome.</p>
<p><code><br />
import edu.stanford.nlp.trees.*;<br />
import edu.stanford.nlp.parser.lexparser.*;<br />
import com.chaoticity.dependensee.*;<br />
import java.util.Collection;<br />
class Test {<br />
public static void main(String []args) throws Exception {<br />
String text = "A quick brown fox jumped over the lazy dog.";<br />
TreebankLanguagePack tlp = new PennTreebankLanguagePack();<br />
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();<br />
LexicalizedParser lp = new LexicalizedParser("englishPCFG.ser.gz");<br />
lp.setOptionFlags(new String[]{"-maxLength", "500", "-retainTmpSubcategories"});<br />
Tree tree = lp.apply(text);<br />
GrammaticalStructure gs = gsf.newGrammaticalStructure(tree);<br />
Collection tdl = gs.typedDependenciesCCprocessed(true);<br />
Main.writeImage(tree,tdl, "image.png",3);<br />
}<br />
}<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/dependensee-a-dependency-parse-visualisation-tool/feed/</wfw:commentRss>
		<slash:comments>73</slash:comments>
		</item>
		<item>
		<title>Google and Urdu Stemming</title>
		<link>http://chaoticity.com/google-and-urdu-stemming/</link>
		<comments>http://chaoticity.com/google-and-urdu-stemming/#comments</comments>
		<pubDate>Fri, 05 Mar 2010 02:30:25 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Urdu]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[stemming]]></category>

		<guid isPermaLink="false">http://chaoticity.com/google-and-urdu-stemming/</guid>
		<description><![CDATA[&#160; Is google (finally) stemming Urdu? The last time I checked, there were doing something like a transliteration based search but in the screenshot below, you can see that searching for the phrase ان پڑھ چٹا shows some stemming is being used. Does anyone know anything?&#160; Oh, and while I’m on this topic, I would [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fgoogle-and-urdu-stemming%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="Google and Urdu Stemming" data-url="http://chaoticity.com/google-and-urdu-stemming/"></a> 
			</div></div>
		<div style="clear:both;"></div><p>&#160;</p>
<p>Is google (finally) stemming Urdu? <a href="http://scalar.wordpress.com/2008/06/02/stemming-in-urdu-and-google/" target="_blank">The last time I checked</a>, there were doing something like a transliteration based search but in the screenshot below, you can see that <a href="http://www.google.co.uk/search?q=+%D8%A7%D9%86+%D9%BE%DA%91%DA%BE+%DA%86%D9%B9%D8%A7" target="_blank">searching for the phrase ان پڑھ چٹا</a> shows some stemming is being used. Does anyone know anything?&#160; Oh, and while I’m on this topic, I would also like to know why is it called چٹا ان پڑھ ?</p>
<p><a href="http://chaoticity.com/images/image10.png"><img title="image" style="border-right: 0px; border-top: 0px; display: inline; border-left: 0px; border-bottom: 0px" height="770" alt="image" src="http://chaoticity.com/images/image_thumb10.png" width="514" border="0" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/google-and-urdu-stemming/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Google as a Question Answering System</title>
		<link>http://chaoticity.com/google-as-a-question-answering-system/</link>
		<comments>http://chaoticity.com/google-as-a-question-answering-system/#comments</comments>
		<pubDate>Sat, 06 Feb 2010 04:34:58 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[chaos]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Web]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[MRR]]></category>
		<category><![CDATA[Question Answer]]></category>
		<category><![CDATA[Summary]]></category>

		<guid isPermaLink="false">http://chaoticity.com/google-as-a-question-answering-system/</guid>
		<description><![CDATA[A Question Answering (QA) system is an Information Retrieval system which gives the answer to a question posed in natural language. For example, if you ask it Who wrote Hamlet?, it should answer Shakespeare. A few years ago (don’t ask me how many), search engines did not focus on language queries. Recently [sic], Google has [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fgoogle-as-a-question-answering-system%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="Google as a Question Answering System" data-url="http://chaoticity.com/google-as-a-question-answering-system/"></a> 
			</div></div>
		<div style="clear:both;"></div><p>A <a href="http://en.wikipedia.org/wiki/Question_answering">Question Answering</a> (QA) system is an Information Retrieval system which gives the answer to a question posed in natural language. For example, if you ask it <i>Who wrote Hamlet?</i>, it should answer <i>Shakespeare</i>. A few years ago (don’t ask me how many), search engines did not focus on language queries. Recently [sic], Google has started incorporating some NLP (Natural Language Processing) in their results. You can try it out by typing the same question in the search box yourself ( <a href="http://www.google.co.uk/search?q=Who+wrote+Hamlet">or clicking here</a> ). </p>
<p><a href="http://chaoticity.com/images/image9.png"><img title="image" style="border-right: 0px; border-top: 0px; display: inline; margin-left: 0px; border-left: 0px; margin-right: 0px; border-bottom: 0px" height="199" alt="image" src="http://chaoticity.com/images/image_thumb9.png" width="330" align="right" border="0" /></a> </p>
<p>During my <a href="http://www.cl.cam.ac.uk/admissions/cstit/">M.Phil. course</a>, one of the tasks was to build a basic QA system and extend it however we liked. We used the <a href="http://trec.nist.gov/data/qa/t8_qadata.html">TREC 8 dataset</a> for evaluations. While building the system, I evaluated how current search engines (read Google) performed on this task. For this, I just queried the exact question and used the summaries of the top five results as answers. Evaluating at that time (2008), I got a <a href="http://en.wikipedia.org/wiki/Mean_reciprocal_rank">Mean Reciprocal Rank</a> (MRR) score of <b>0.212</b> over 198 questions. 156 questions had no answers found in top 5 responses.</p>
<p>This term, I am demonstrating for the same task. Demonstrators are usually PhD students who provide help and guidance to junior students. For pure geek fun and lack of better things to do while taking a break, I decided to quickly jolt down a JavaScript (read&#160; <a href="http://chaoticity.com/wp-admin/jquery.com">JQuery</a> ) based QA system. This time,&#160; the resulting MRR score over 198 questions was <b>0.384 </b>while only 79 questions had no answers found in top 5 responses.</p>
<p>The results show clearly that during the last two years, Google has significantly improved on answering NLP queries. In fact (IIRC), my baseline system back in 2008 (based on <a href="http://www.cl.cam.ac.uk/~aac10/papers/rmrsdraft.pdf" target="_blank">RMRS</a> based matching of sentences from the top 100 documents returned by an IR system) could only achieve an MRR score of approximately <b>0.290, </b>showing that the current results are much better than that baseline. I hope this decade sees some more developments/improvements in QA systems and I can ask a system <em>What do you get if you multiply</em> <em>six by nine?</em></p>
<blockquote><p>I&#8217;ve always said there was something fundamentally wrong with the universe. <strong>~Arthur Dent</strong></p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/google-as-a-question-answering-system/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Online English to Urdu Translator</title>
		<link>http://chaoticity.com/online-english-to-urdu-translator/</link>
		<comments>http://chaoticity.com/online-english-to-urdu-translator/#comments</comments>
		<pubDate>Sat, 23 Jan 2010 19:39:33 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[OpenWare]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Urdu]]></category>
		<category><![CDATA[english]]></category>
		<category><![CDATA[hindi]]></category>
		<category><![CDATA[translation]]></category>
		<category><![CDATA[transliteration]]></category>

		<guid isPermaLink="false">http://chaoticity.com/online-english-to-urdu-translator/</guid>
		<description><![CDATA[While all the online English to Urdu translators that I have seen don’t really work that well (read suck), if we make use the overlapping vocabulary and grammar of Hindi and Urdu along with using Google’s translation API, things come out pretty decent (as mentioned in my previous post). Here’s a small 15 min first [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fonline-english-to-urdu-translator%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="Online English to Urdu Translator" data-url="http://chaoticity.com/online-english-to-urdu-translator/"></a> 
			</div></div>
		<div style="clear:both;"></div><p>While all the online English to Urdu translators that I have seen don’t really work that well (read suck), if we make use the overlapping vocabulary and grammar of Hindi and Urdu along with using Google’s translation API, things come out pretty decent (<a href="http://chaoticity.com/how-do-you-transliterate-that/" target="_blank">as mentioned in my previous post</a>). Here’s a small 15 min first cut script which just uses English to Hindi translation and then transliterates from Hindi to Urdu. Feel free to use the code and do ping me if you improve something. This works as a Hindi to Urdu transliterator as well.</p>
<p><script src="http://www.google.com/jsapi" type="text/javascript"></script><script type="text/javascript">google.load("language", "1");var conv=[];
	conv['ऀ']='';//'ऀ';
	conv['ँ']='ن'; 
	conv['ं']='ن';
	conv['ः']='ہ';
	conv['ऄ']='';//'ऄ';
	conv['अ']='اَ';
	conv['आ']='آ';
	conv['इ']='اِ';
	conv['ई']='اِی';
	conv['उ']='اُ';
	conv['ऊ']='اُو';
	conv['ऋ']='';//'ऋ';
	conv['ऌ']='';//'ऌ';
	conv['ऍ']='ای';
	conv['ऎ']='ऎ';
	conv['ए']='';//'ِ';
	conv['ऐ']='ائے';
	conv['ऑ']='';//'ऑ';
	conv['ऒ']='ؤ';
	conv['ओ']='او';
	conv['औ']='اؤ';
	conv['क']='ک';
	conv['ख']='کھ';
	conv['ग']='گ';
	conv['घ']='گھ';
	conv['ङ']='ن';
	conv['च']='چ';
	conv['छ']='چھ';
	conv['ज']='ج';
	conv['झ']='جھ';
	conv['ञ']='ن';
	conv['ट']='ٹ';
	conv['ठ']='ٹھ';
	conv['ड']='ڈ';
	conv['ढ']='ڈھ';
	conv['ण']='ن';
	conv['त']='ت';
	conv['थ']='تھ';
	conv['द']='د';
	conv['ध']='دھ';
	conv['न']='ن';
	conv['ऩ']='';//'ऩ';
	conv['प']='پ';
	conv['फ']='پھ';
	conv['ब']='ب';
	conv['भ']='بھ';
	conv['म']='م';
	//conv['य']='ے';
	conv['य']='ی';
	conv['र']='ر';
	conv['ऱ']='ऱ';
	conv['ल']='ل';
	conv['ळ']='';//ळ';
	conv['ऴ']='';//'ऴ';
	conv['व']='و';
	conv['श']='ش';
	conv['ष']='ش';
	conv['स']='س';
	conv['ह']='ہ';
	conv['ऺ']='';//'ऺ';
	conv['ऻ']='';//'ऻ';
	conv['़']='';//'़';
	conv['ऽ']='';//'ऽ';
	conv['ा']='ا';
	conv['ि']='ِ';
	conv['ी']='ی';
	conv['ु']='ُ';
	conv['ू']='وُ';
	conv['ृ']='ر';
	conv['ॄ']='';//'ॄ';
	conv['ॅ']='ی';
	conv['ॆ']='ء';
	conv['ै']='ی';
	//conv['े']='ے';
	conv['े']='ی';
	conv['ॉ']=''//'ا';
	conv['ॊ']='';//'ॊ';
	conv['ो']='و';
	conv['ौ']='و';
	conv['्']='';
	conv['ॎ']='';//'ॎ';
	conv['ॏ']='';//'ॏ';
	conv['ॐ']='';//'ॐ';
	conv['॑']='॑';
	conv['॒']='॒';
	conv['॓']='॓';
	conv['॔']='॔';
	conv['ॕ']='';//'ॕ';
	conv['ॖ']='';//'ॖ';
	conv['ॗ']='';//'ॗ';
	conv['क़']='ق';
	conv['ख़']='خ';
	conv['ग़']='غ';
	conv['ज़']='ز';
	conv['ड़']='ڑ';
	conv['ढ़']='ڑھ';
	conv['फ़']='ف';
	conv['य़']='';//य़';
	conv['ॠ']='';//'ॠ';
	conv['ॡ']='';//'ॡ';
	conv['ॢ']='';//'ॢ';
	conv['ॣ']='';//'ॣ';
	conv['।']='۔';
	conv['॥']='';//'॥';
	conv['0']='۰';
	conv['1']='۱';
	conv['2']='۲';
	conv['3']='۳';
	conv['4']='۴';
	conv['5']='۵';
	conv['6']='۶';
	conv['7']='۷';
	conv['8']='۸';
	conv['9']='۹';
	conv['॰']='॰'
	conv['ॱ']='';//'ॱ';
	conv['ॲ']='';//'ॲ';
	conv['ॳ']='';//'ॳ';
	conv['ॴ']='';//'ॴ';
	conv['ॵ']='';//'ॵ';
	conv['ॶ']='';//'ॶ';
	conv['ॷ']='';//'ॷ';
	conv['ॸ']='';//'ॸ';
	conv['ॹ']='';//'ॹ';
	conv['ॺ']='';//'ॺ';
	conv['ॻ']='';//'ॻ';
	conv['ॼ']='';//'ॼ';
	conv['ॽ']='';//'ॽ';
	conv['ॾ']='';//'ॾ';
	conv['ॿ']='';//'ॿ';
	conv['?']='؟';
	conv['.']='۔';
	function xliterate() {
		src = document.getElementById('src').value;	
		google.language.translate(src, "en", "hi", function(result) {
				if (!result.error) {
					mid = result.translation;
					dest='';
					for(i=0; i<mid.length;i++) {
						dest=dest+ ((conv[mid[i]]!=undefined) ? conv[mid[i]] : mid[i]);
					}
					document.getElementById('dest').value=dest;
				} else {
					alert(result.error);
				}
			});			
	}
</script><br />
<textarea id="src" style="font-family: arial unicode;" cols="80" rows="5">How are you?</textarea><br />
<textarea id="dest" style="direction: rtl; font-family: arial unicode;"  cols="80" rows="5">آپ کیسی ہیں؟</textarea></p>
<input id="tx" onclick="xliterate(); return false;" type="button" value=" Translate "/>
<p>(Thanks to عزت مآب جناب آغا علی رضا قزلباش رحمتہ اللہ علیہ who graciously sent me his term report on Hindi to Urdu transliteration, from where I’ve copied (and modified) the character mapping.)</p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/online-english-to-urdu-translator/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How do you transliterate that?</title>
		<link>http://chaoticity.com/how-do-you-transliterate-that/</link>
		<comments>http://chaoticity.com/how-do-you-transliterate-that/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 12:18:27 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://chaoticity.com/how-do-you-transliterate-that/</guid>
		<description><![CDATA[I am thinking of using google’s English to Hindi translation and hooking it to a Hindi to Urdu transliterator to get an approximate English to Urdu translation. The Hindi to English transliteration provided by google has some errors which might not be there if we convert directly to Urdu. For example, on translating the sentence [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fhow-do-you-transliterate-that%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="How do you transliterate that?" data-url="http://chaoticity.com/how-do-you-transliterate-that/"></a> 
			</div></div>
		<div style="clear:both;"></div><p> I am thinking of using google’s English to Hindi translation and hooking it to a Hindi to Urdu transliterator to get an approximate English to Urdu translation. The Hindi to English transliteration provided by google has some errors which might not be there if we convert directly to Urdu. For example, on translating the sentence </p>
<p><strong>It can be used in Urdu too</strong>, <a href="http://chaoticity.com/images/image6.png"><img title="image" style="border-right: 0px; border-top: 0px; display: inline; margin-left: 0px; border-left: 0px; margin-right: 0px; border-bottom: 0px" height="142" alt="image" src="http://chaoticity.com/images/image_thumb6.png" width="240" align="right" border="0" /></a> </p>
<p>we get the Hindi translation</p>
<p><strong>यह उर्दू में इस्तेमाल किया जा सकता है</strong> </p>
<p>and the Roman transliteration of the Hindi translation</p>
<p>&#160;<em><strong>yaha urdū mēṁ istēmāla kiyā jā sakatā hai</strong></em>. </p>
<p>If you notice the first word, it should have been transliterated to “yeh”. Instead, we get a phonetic transliteration which is made up of two letters <em>ya</em> and <em>ha. </em>Transliteration from Hindi to Urdu directly would have avoided that error. There’s a nice paper titled “<a href="http://www.crulp.org/clt09/download/Papers/Paper4.pdf" target="_blank">Hindi to Urdu Conversion: Beyond Simple Transliteration</a>”&#160; which lists problems faced in simple character-to-character transliteration fromm Hindi to Urdu. Whenever I get some time, I’ll try to cook some javascript code quickly. Until then, the idea is open. Any takers?</p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/how-do-you-transliterate-that/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>What do you tweet about? : A shell script for getting most frequent words for twitter</title>
		<link>http://chaoticity.com/what-do-you-tweet-about-a-shell-script-for-getting-most-frequent-words-for-twitter/</link>
		<comments>http://chaoticity.com/what-do-you-tweet-about-a-shell-script-for-getting-most-frequent-words-for-twitter/#comments</comments>
		<pubDate>Fri, 18 Dec 2009 20:29:51 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[chaos]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[OpenWare]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://chaoticity.com/what-do-you-tweet-about-a-shell-script-for-getting-most-frequent-words-for-twitter/</guid>
		<description><![CDATA[There are a lot of web apps around which report your twitter stats. But at times, it&#8217;s better to do things yourself. I haven’t done any fun coding for ages now so last night, I finally got around to making a small program to gather twitter word statistics. The fun part was to do everything [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fwhat-do-you-tweet-about-a-shell-script-for-getting-most-frequent-words-for-twitter%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="What do you tweet about? : A shell script for getting most frequent words for twitter" data-url="http://chaoticity.com/what-do-you-tweet-about-a-shell-script-for-getting-most-frequent-words-for-twitter/"></a> 
			</div></div>
		<div style="clear:both;"></div><p>There are a lot of web apps around which report your twitter stats. But at times, it&#8217;s better to do things yourself. I haven’t done any fun coding for ages now so last night, I finally got around to making a small program to gather twitter word statistics. The fun part was to do everything using&#160; unix tools.&#160; <a href="http://chaoticity.com/software/tword.zip" target="_blank">Here’s a small script file</a> which displays the 10 most used words in the tweets for any twitter id.&#160; I have only tested it under cygwin so this is probably the best place to say “USE AT YOUR OWN RISK”. </p>
<p>Here’s how it works.</p>
<ol>
<li>downloads all status information in a directory </li>
<li>extracts the status message lines</li>
<li>does some regex magic and filters <a href="http://en.wikipedia.org/wiki/Stop_words" target="_blank">stop words</a> like the, a, an etc. ( haven’t&#160; seen this done earlier anywhere but the <strong>join </strong>command comes in handy for processing stopwords)</li>
<li>displays the top 10 most frequent words (and emoticons) </li>
</ol>
<p>Twitter assigns a limit to the number of messages that you can download (3200). Also, the twitter id timeline has to be public for this script to work. All you need to do is <a href="http://chaoticity.com/software/tword.zip" target="_blank">download the script file and stop word list</a>, keep them in the same directory, run it with the twitter id in the command line and you’ll get the list of words with the frequency at the start of each line. For example,</p>
<blockquote><p>$ ./tword.sh barackobama      <br />161 watch       <br />119 live       <br />92 http://mybarackobamacom/livestream       <br />81 health       <br />63 reform       <br />55 today       <br />52 rally       <br />48 #hc09       <br />47 &amp;amp;       <br />38 vote</p>
</blockquote>
<p>The script takes time to complete so be patient. As you may have noticed, there are still html tags inside. You can remove them by piping in any html2text program. There’s a small perl script in the zipfile which does this processing. The output now brings in a new word “change”. You will, however, need to pipe this in the script after installing <a href="http://search.cpan.org/~gaas/HTML-Parser-3.64/lib/HTML/Entities.pm" target="_blank">HTML::Entities</a> though CPAN. </p>
<blockquote><p>$ ./tword.sh barackobama      <br />161 watch       <br />119 live       <br />92 http://mybarackobamacom/livestream       <br />83 health       <br />68 change       <br />63 reform       <br />55 today       <br />55 rally       <br />48 #hc09       <br />39 vote</p>
</blockquote>
<p>My list toppers as <strong>good, :D, time, day, twitter, read, hope, back, :p</strong> and <strong>make. </strong>I wonder if this makes me a happy person :)</p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/what-do-you-tweet-about-a-shell-script-for-getting-most-frequent-words-for-twitter/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>5 minutes of FM ads</title>
		<link>http://chaoticity.com/5-minutes-of-fm-ads/</link>
		<comments>http://chaoticity.com/5-minutes-of-fm-ads/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 13:33:00 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[chaos]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[Politics]]></category>

		<guid isPermaLink="false">http://chaoticity.com/5-minutes-of-fm-ads/</guid>
		<description><![CDATA[I get to listen to some FM radio ads every now and then (don’t ask why) but today’s line up was rather interesting. !ہر چیز میزان میں، اچھی لگتی ہے (append more ridiculously childish lyrics here) Warid is asking us to start using our SIMs again by uttering the spell &#34;کھل جا SIM SIM&#34;. Not [...]]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2F5-minutes-of-fm-ads%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="5 minutes of FM ads" data-url="http://chaoticity.com/5-minutes-of-fm-ads/"></a> 
			</div></div>
		<div style="clear:both;"></div><p>I get to listen to some FM radio ads every now and then (don’t ask why) but today’s line up was rather interesting.</p>
<ol>
<li>!ہر چیز میزان میں، اچھی لگتی ہے (append more ridiculously childish lyrics here)</li>
<li>Warid is asking us to start using our SIMs again by uttering the spell &quot;کھل جا SIM SIM&quot;. Not a bad techie pun even though its a bit corny.</li>
<li>Junaid Jamshed is telling us that its ok to get Lays (no ungrammatical pun intended). Apparently he has done the research himself and was ‘surprised’ to find out that they were made in Pakistan! (no really? wow! ہم کتنی ترقی کر گٰے ہیں) Also that they are 100% حلال (with all stress on ح ، حلق سے). Personally, I prefer Slanty.</li>
<li>The Prime Minister has established a new “<a href="http://www.app.com.pk/en_/index.php?option=com_content&amp;task=view&amp;id=75792&amp;Itemid=2" target="_blank">Special Fund for Victims of Terrorism</a>”. Perhaps its a step towards not using the three lettered acronym (IDP). I was wondering if this makes people from US and UK eligible as well.</li>
</ol>
<p> All this happened while “<a href="http://www.419baiter.com/_scam_emails/0-rzz-09-04b/scam-email-045415.shtml" target="_blank">Miss Recheal Goodluck</a>” was dropping me an email telling me that she is interested in me *blush* *blush*</p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/5-minutes-of-fm-ads/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NCA shows off its students</title>
		<link>http://chaoticity.com/nca-shows-off-students/</link>
		<comments>http://chaoticity.com/nca-shows-off-students/#comments</comments>
		<pubDate>Sat, 24 Jan 2009 15:34:15 +0000</pubDate>
		<dc:creator>awais</dc:creator>
				<category><![CDATA[Language]]></category>
		<category><![CDATA[Ambiguity]]></category>
		<category><![CDATA[BBC]]></category>
		<category><![CDATA[Headline]]></category>

		<guid isPermaLink="false">http://chaoticity.com/?p=26</guid>
		<description><![CDATA[I am sure BBC Urdu never does this intentionally but every once in a while, they put up an ambiguous heading. This one, roughly translated, can either mean &#8216;Exhibition of students from NCA&#8217; or &#8216;Exhibition by students from NCA&#8217;. (You may NOT comment about the font problem here :&#62; )]]></description>
			<content:encoded><![CDATA[<div style="height:33px;" class="really_simple_share"><div style="width:100px;" class="really_simple_share_facebook_like"> 
				<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fchaoticity.com%2Fnca-shows-off-students%2F&amp;layout=button_count&amp;show_faces=false&amp;width=100&amp;action=like&amp;colorscheme=light&amp;send=false&amp;height=27" 
					scrolling="no" frameborder="0" style="border:none; overflow:hidden; width:100px; height:27px;" allowTransparency="true"></iframe>
			</div><div style="width:110px;" class="really_simple_share_twitter"> 
				<a href="http://twitter.com/share" class="twitter-share-button" data-count="horizontal" 
					data-text="NCA shows off its students" data-url="http://chaoticity.com/nca-shows-off-students/"></a> 
			</div></div>
		<div style="clear:both;"></div><div id="attachment_27" class="wp-caption alignleft" style="width: 161px"><img class="size-full wp-image-27" title="BBC heading gone wrong (again)" src="http://chaoticity.com/images/nca.png" alt="BBC heading gone wrong (again)" width="151" height="174" /><p class="wp-caption-text">BBC heading gone wrong (again)</p></div>
<p>I am sure BBC Urdu never does this intentionally but <a href="http://scalar.wordpress.com/2008/04/21/when-headlines-go-wrong/" target="_blank">every once in a while</a>, they put up an ambiguous heading. This one, roughly translated, can either mean &#8216;Exhibition of students from NCA&#8217; or &#8216;Exhibition by students from NCA&#8217;.</p>
<p>(You may NOT comment about the font problem here :&gt; )</p>
]]></content:encoded>
			<wfw:commentRss>http://chaoticity.com/nca-shows-off-students/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

