The Grapes of RAG

How do you explain Retrieval Augmented Generation (RAG) and agents to a 10 year old? I had to do it last week. It started when I asked them if they knew how AI works.

“AI, like ChatGPT? You type a question in and it searches for the answer”

“Yes, but not all AIs have access to the Internet so sometimes the answer is not correct. Let me give you an example. Let’s say you have a friend … let’s call them Lilly. So you ask Lilly, ‘Who is the President of the United States?’, and they reply ‘Joe Biden’. Is that the correct answer?”

“No”

“Right, but Biden was president when Lilly last learned about this topic so they just answered what they knew. Let’s say there’s another friend there… we’ll call them Sara. Sara searches your question on the Internet and shares the contents of the top few links of the results with Lilly. Now Lilly can read the latest information and answer your question correctly. What would they say?”

“They would say sorry for being wrong and then say ‘Donald Trump’ because they learned it from Sara”.

“Heh yes, they will say sorry because they have been trained to be polite! And they will answer correctly, but there’s a catch! Lilly has (anterograde) amnesia, which means they cannot learn anything new. If you ask the same question sometime later, they would have forgotten the answer and they’ll have to ask Sara again. So Sara retrieved some possible answers – by searching like anyone else – and augmented or added to Lilly’s knowledge. Then Lilly used it to generate the right answer. So together they did Retrieval Augmented Generation. And then Lilly forgot what Sara told them. Makes sense?”

“hmmm (head nod)”

“Awesome! So the names weren’t random. Sara is the search engine and Lilly is the LLM”

“Ooooh! (long pause) What’s an LLM?”

“An LLM is a Large Language Model”

“Lilly Language Model!” (jumping up)

“Yes, that too! Think of it this way. If you give Lilly half a sentence, they can predict the next word which comes after them. Like how would Lilly complete the sentence ‘The president of the United States is…’? They would say Biden. But if they know when and how to search the Internet, they will say Trump every time. But now they are no longer an LLM, they have become an agent!. They can find the answers on their own by using the search tool if needed.”

“Secret Agent!”

“Hehe, yes, there may be some LLMs which are secretly agents but let’s not go there. For now, let’s just say that agents can take actions to reach their goal and LLMs can’t. Do you want to make an agent?”

“I am hungry, can I have orange juice?”

We skipped the juice (if you are hungry, eat something healthy!), but the conversation got me thinking about how different a world our children are growing up in. With their AIs and agents and RAGs, access to comprehensible knowledge will not be a problem for this generation, whereas we had to sift through dozens of library shelves, ransack bookstores, and look underneath every rock-like object to find books and teachers to learn from. Google changed the searching habits a bit, but you have to know how to use it properly and keyword based queries are in no way a substitute for questions asked in natural language. If companies wanted their users to be able to search their internal data and knowledge silos, they had to set up their own search systems mostly powered on Lucene based tech (Solr, Elasticsearch, etc). Traditional Question-Answering systems were just not powerful enough and FAQs were the way to go.

And then the LLM leap happened and everyone rushed to “AI-enable” their websites with chatbots. Until recently, the quickest option to do so was to have a RAG in place. Since off-the-shelf LLMs have a knowledge cut-off date, any “extra” knowledge that goes into these RAGs comes from the search results and is fed to the LLM. This means breaking down this knowledge into processable chunks, converting those chunks to vectors (aka embeddings) and storing them in a vector database. When the user asks a question, their query is treated similarly. Then the system searches for vectors similar to the query vector in the database (remember dot product?), and the chunks corresponding to the top results are fed to the LLM as context with a prompt such as:

Use the following pieces of context to answer the question. If you don’t know the answer, just say that you don’t know.
Question: {question}
Context: {context}
Answer:

The LLM, like any good machine, follows the prompt and generates the correct answer (if it’s in the context), (mostly) without hallucinating.

But why am I describing this in the past tense? The answer is in three letters: MCP. And that is a story for the next post.