
Boosting Search with MCP
With big data comes big responsibility – of making it FAIR: Findable, Accessible, Interoperable and Reusable. But ensuring data is truly FAIR is challenging as scaling search systems to handle big data requires significant computational resources. Let’s take an example from a user’s perspective. Say you are using BioStudies, our general purpose repository for publishing life sciences data, and want to download data from all functional genomics studies on chickens this year that use the sequencing assay technology....

The Grapes of RAG
How do you explain Retrieval Augmented Generation (RAG) and agents to a 10 year old? I had to do it last week. It started when I asked them if they knew how AI works. “AI, like ChatGPT? You type a question in and it searches for the answer” “Yes, but not all AIs have access to the Internet so sometimes the answer is not correct. Let me give you an example....

My Language - My AI
Remember watching Arrival? A linguist tasked with communicating with aliens learns their language lets her perceive time non-linearly, reshaping her understanding of reality. The plot brought the Sapir-Whorf’s hypothesis back into mainstream about how language actively shapes the way we think and determines what we can think about. In the context of LLMs and Chain of Thought (CoT) reasoning, the hypothesis becomes particularly relevant since the language of thought quite literally determines the quality of computational output....

UQA - Corpus for Urdu Question Answering
I think it was around 1999 when I first heard that Urdu is a low resource language. 25 years later Urdu is still considered low-resource despite having over 70 million native speakers. This is because large manually curated linguistic resources required for training are still not available for Urdu. One way around this barrier is to translate existing corpora from a resource-rich language (cough English cough) to Urdu. This seems like a chicken-and-egg problem though since good automatic translation systems require training resources....

Concordance of Iqbal's Urdu Poetry | کشف الالفاظ - اردو کلامِ اقبال
tl;dr I built a KWIC concordance of Iqbal’s Urdu Poetry - Check out iqbal.chaoticity.com ! In distributional semantics, just like in interpretation of law, a word is known by the company it keeps. Observing how a word appears in context of other words allows us to identify usage nuances and gain deeper insights into the meaning and connotations of that particular term. Traditional word indexes do not show this context and thus miss out on the most remarkable ability of the human brain – recognising patterns....

#AcademicValentines
Roses are red Violets are blue Everyone’s equal When σ is μ

Almost eleven
This blog is almost 11 now. I’ll be trying out switching to an android client this year, just to see if it makes any difference in my posting frequency. Meanwhile, here’s a ’test swan'.

caffeine chronicles
Found this “graph” I created for #CraftyDataViz some time ago. It managed to get an honourable mention in the “Most Fun” category, so a much belated yay! :D (here’s the original tweet)

Of Mir and Meerkats
Breakfast Doodle

Accommodation in Cambridge
About 3-4 times a year, I get asked by students, visiting scholars, and people moving here for work about how to find accommodation in Cambridge. I usually reply by digging up the last few such emails and tweaking my response a little bit but it just crossed my mind that blogging about it might be more helpful. So here are some things I have learned about finding accommodation in Cambridge....