RAG does not teach a model new facts. It fetches relevant text and pastes it into the prompt, so the model answers from documents instead of memory.
Retrieval-augmented generation is one of the most useful patterns in applied AI, and also one of the most misunderstood. The common misconception is that RAG trains the model on your documents, or that it makes the model smarter. It does neither. RAG changes the model's weights not at all. It is a search step bolted onto a generation step. When a question comes in, the system finds relevant text from a collection you control, drops that text into the prompt, and asks the model to answer using it. The model is still working from its context window, the same as always. RAG just decides what goes into that window.
How it actually works#
The pipeline has two halves, and the first half is the part people skip when they explain it.
- Indexing, done ahead of time. You take your documents, split them into chunks, and convert each chunk into an embedding, a vector that captures its meaning. You store those vectors in a database built for similarity search.
- Retrieval and generation, done at query time. You embed the user's question the same way, find the chunks whose vectors are closest to it, and paste the top matches into the prompt alongside the question. The model reads those chunks and writes an answer.
The matching is the quiet engine of the whole thing. Because similar meanings produce nearby vectors, a question about "refund policy" can surface a paragraph that never uses the word "refund" but talks about "returning a purchase for credit." That semantic match is what makes RAG better than keyword search, and it is also where RAG most often fails, because if retrieval pulls the wrong chunks, the model answers from the wrong source.







Discussion