ML 101 - Data Grounding and RAG

October 24, 2024 · 5 min read

Data Grounding

Prompt: "What is the battery capacity of the latest Tesla Model 3?"

Without Data Grounding: A model trained only up to 2023 might say:
"The Tesla Model 3 has a battery capacity of up to 75 kWh."
This could be outdated if the model has changed.
With Data Grounding Using RAG: The model checks Tesla's latest data:
"The 2025 Tesla Model 3 has a battery capacity of up to 82 kWh."

Imagine an elderly librarian (our LLM) who is incredibly knowledgeable but hasn't left the library in years. While they know an enormous amount, their knowledge is frozen in time. Data grounding is like giving him a smartphone with internet access.
Now, instead of relying solely on their memory, they can provide up-to-date information by checking reliable external sources.

Ways to ground a model:

RAG: Combining retrieval with generation for real-time accuracy.
Prompt Engineering: Crafting prompts that embed necessary context or examples.
Fine-Tuning: Updating the model's weights with domain-specific data.

Retrieval-Augmented Generation (RAG)

Imagine you're searching for a specific book in a huge library. You know the library has all sorts of information you need, but flipping through every single page or shelf would be time-consuming. Instead, you use catalogs and indexes to find the relevant books quickly. Once you have those books in hand, you summarize or combine their content to answer your question.

That, in a nutshell, is RAG:

Retrieve the relevant data (like pulling the right books off the shelf).
Generate an answer (like creating a summary from those books).

Why RAG?

LLMs are powerful, but they come with major constraints:

Limited Knowledge Window till their last training data.
Bias from Training Data
Context Hallucinations as they generate text based on probabilities of what word should come next, they sometimes hallucinate.

RAG addresses these limitations by letting your LLM "look up" new or private data (like going to that library) without retraining the entire model from scratch. This is especially important for enterprises that have a lot of proprietary information.

RAG Architecture

RAG typically involves two main steps:

Retrieval: This is where you "search the library." A retrieval component (often a semantic search engine) scans the external data sources to find relevant chunks of text.
Generation: After the relevant texts are retrieved, the LLM "reads" them and generates the final answer, ideally grounded in those retrieved facts.

RAG Example

User Query: "What are the key market trends for wearable devices in 2025?"
Index Search: The system checks an index (like a library catalog) built from corporate research documents on wearable devices.
Retrieve Documents: Relevant research papers, whitepapers, or internal reports are found. Let's say 3–5 documents are deemed most relevant.
LLM Generation: The LLM then processes these 3–5 documents and crafts a synthesized response. Instead of relying on its "old" training data, it now grounds its answer in these up-to-date, private documents.

Precision and Recall

LLMs are verbose by default — design your system to maximize recall upstream, and enforce precision downstream through reranking, validation, or filters.

Stage	Focus	Explanation
🔍 Retriever	High Recall	“Find anything that might help.”
🧠 LLM + Reranker	High Precision	“Now pick only the most relevant and trustworthy.”

Imagine you're a detective investigating a crime. You have a list of suspects and you need to find the one who committed the crime.

Metric	What it Means	Pros	Cons
High Precision	Most predicted positives are correct	Accurate predictions, low false alarms	Might miss actual positives (low recall)
Low Precision	Many predicted positives are wrong	Might catch all actual positives (if recall is high)	Many false alarms
High Recall	Most actual positives are found	Rarely misses important positives	Might include irrelevant or wrong positives (low precision)
Low Recall	Many actual positives are missed	Very selective predictions	Misses important positive cases

Common Use Case	Optimize For	Why?
Intent Detection (chatbots)	High Precision, Low Recall	Wrong intents lead to confusing replies. Better to admit uncertainty than be wrong.
Information Retrieval (RAG)	Moderate Precision, High Recall	Gather all potentially relevant context. LLM can handle noise via reranking.
Question Answering	High Precision, Recall Ignore	Trust relies on correctness. Better to say “I don’t know” than hallucinate.
Summarization	High Precision, Recall Ignore	Summary must reflect source truthfully. Don’t invent info.
Code Generation	High Precision, Recall Ignore	Invalid code breaks functionality. Avoid hallucinated or broken output.

Data Grounding​

Retrieval-Augmented Generation (RAG)​

Why RAG?​

RAG Architecture​

RAG Example​

Precision and Recall​