← Hub · RAG vs fine-tuning

RAG vs fine-tuning

RAG injects external, up-to-date context at inference time by retrieving documents. Fine-tuning updates model weights on a task-specific dataset so behavior is “baked in.” They solve different problems and are often combined.

01 Retrieval-augmented generation (RAG)

A user query is embedded, matched against a vector index (or hybrid search), and the top passages are prepended to the prompt. The LLM conditions on this evidence—reducing hallucination when the corpus is trustworthy and well-chunked.

Figure — RAG flow

Strengths: fresh facts without retraining, citations possible, good for proprietary docs. Weaknesses: retrieval quality becomes the ceiling; chunking and hybrid search matter a lot.

02 Fine-tuning (supervised / preference)

You continue training (or LoRA/QLoRA adapters) on curated examples—instruction-output pairs, preference rankings (RLHF/DPO), or domain text. The model internalizes style, format, and domain vocabulary; weights change, so versioning and evaluation are essential.

Figure — fine-tuning (conceptual)

Strengths: stable behavior, specific output formats, can reduce prompt length. Weaknesses: data must be high quality; risk of forgetting; update cycle slower than editing a doc store.

03 Side-by-side

Dimension	RAG	Fine-tuning
Factual freshness	Strong — update the corpus/index	Weak — need retrain for new facts
Style / tone / format	Prompt-dependent	Strong — if reflected in data
Latency & cost	Extra retrieval step + longer context	Inference like base model (after tuning)
Privacy / compliance	Control access at index layer	Data in training pipeline; audit training sets
When data is scarce	Works if docs exist	Risk of overfitting; prefer parameter-efficient methods

04 Decision sketch

Figure — coarse decision flow

Many products use RAG for grounding and fine-tuning (or adapters) for tone and tool-use format. Evaluate end-to-end task success, not single-component accuracy alone.