Fine-Tuning vs RAG vs Prompt Engineering
Three approaches to customizing AI — when each works, the cost and complexity tradeoffs, and which to try first.
The AI model out of the box is good at general tasks. But you want it to be good at your tasks — writing code in your style, following your conventions, referencing your documentation, and understanding your domain.
There are three approaches to customizing AI behavior, and they sit on a spectrum of complexity:
Simple ──────────────────────────────────── Complex
Prompt Engineering → RAG → Fine-Tuning
Low cost High cost
Minutes to set up Weeks to set up
Most flexible Most specializedMost developers should start on the left and only move right when they hit a genuine limitation. Let's understand what each approach does and when it's the right choice.
Prompt Engineering — The 80% Solution
What it is: Crafting your instructions (prompts) to guide the model's behavior. This includes system prompts, example-based instructions, and structured templates.
How it works: You don't change the model at all. You change what you tell the model. A well-crafted prompt can make a general-purpose model behave like a domain specialist.
# System Prompt Example
You are a TypeScript developer working on a Next.js application.
## Code Style
- Use functional components with hooks
- Always use TypeScript strict mode
- Prefer named exports over default exports
- Use Zod for all input validation
## Response Format
- Explain your approach in 1-2 sentences
- Show the code
- Note any assumptions you made
## Do NOT
- Use the "any" type
- Use default exports
- Write classes (use functions and hooks instead)Strengths:
- Instant to implement — change the prompt, get different behavior
- No technical infrastructure required
- Easy to iterate — try a prompt, see the result, adjust
- Works with any model from any provider
- CLAUDE.md files, Cursor rules, and system prompts are all prompt engineering
Weaknesses:
- Uses context window space (every instruction is tokens)
- The model might not follow complex instructions perfectly
- Can't teach the model genuinely new knowledge (only direct its existing knowledge)
When to use it: Always try prompt engineering first. For most coding tasks, a well-written CLAUDE.md file and clear prompts are sufficient. Move to RAG or fine-tuning only when prompt engineering hits a wall.
Real cost: Free (part of your normal API usage).
RAG — Retrieval-Augmented Generation
What it is: Instead of putting all your context in the prompt upfront, RAG dynamically retrieves relevant information and injects it into the prompt at query time.
How it works:
1. You have a knowledge base (docs, code, wiki, etc.)
2. The knowledge base is indexed (chunked and embedded)
3. When a question comes in, the system searches the index
for the most relevant chunks
4. Those chunks are inserted into the prompt as context
5. The model generates a response using the retrieved contextThink of it as giving the model a search engine. Instead of the model relying only on its training data, it can look up specific, current information from your sources.
Without RAG:
User: "How do I configure authentication in our app?"
Model: [Gives generic auth advice based on training data]
With RAG:
User: "How do I configure authentication in our app?"
System: [Searches your docs → finds auth setup guide]
Model: [Gives specific advice based on YOUR auth documentation]
"In your project, authentication is handled by Clerk.
Configure it in src/middleware.ts with the clerkMiddleware()
function. See the setup guide at docs/auth-setup.md..."Strengths:
- The model can access knowledge it wasn't trained on (your internal docs, recent changes)
- Knowledge stays current — update the source documents, and the model's answers update
- Scales to large knowledge bases (thousands of documents)
- No model training required — you keep using the same model
Weaknesses:
- Requires infrastructure (vector database, embedding pipeline, retrieval system)
- Retrieved context quality varies — bad retrieval = bad answers
- Adds latency (search step before generation)
- Chunks of context may lose important surrounding information
When to use it:
- Your team has significant internal documentation that the model doesn't know about
- You need the AI to reference current, project-specific information
- You're building a tool that answers questions about your codebase or documentation
- The knowledge base changes frequently (prompt engineering can't keep up)
Real cost: Low-to-medium. Vector databases (Pinecone, Weaviate, pgvector) have costs, and embedding documents requires API calls. For a small-to-medium knowledge base, expect $20-$100/month.
In practice for coding: MCP servers and Claude Code's file reading are a form of RAG — the agent retrieves relevant files and includes them as context. The difference is that formal RAG systems use semantic search (meaning-based) rather than file-path-based retrieval.
Fine-Tuning — The Custom Model
What it is: Training a model on your specific data to change its default behavior. The model itself is modified to produce different outputs.
How it works:
1. Collect training examples (hundreds to thousands)
Input: "Write a database query for user lookup"
Output: [Code that follows YOUR specific patterns]
2. Train the model on these examples
(The provider runs the training on their infrastructure)
3. Deploy the fine-tuned model
(New model endpoint that reflects your training data)After fine-tuning, the model inherently produces output matching your style and patterns, without needing instructions in the prompt.
Strengths:
- Changes the model's default behavior (no prompt instructions needed)
- Can encode specific coding styles, patterns, and domain knowledge
- Reduces prompt size (conventions are baked in, not explained each time)
- Consistent output quality for the trained domain
Weaknesses:
- Expensive — training costs hundreds to thousands of dollars
- Slow — training takes hours to days
- Requires high-quality training data (garbage in, garbage out)
- Can degrade general capabilities (the model gets better at your specific task but may get worse at others)
- Only available for certain models and providers
- Needs retraining when your conventions change
When to use it:
- You have thousands of examples of the exact output you want
- Prompt engineering can't capture the nuance of your coding style
- You're making high-volume API calls and want to reduce per-call token costs (shorter prompts)
- You need the model to know domain-specific terminology or patterns that general training doesn't cover
Real cost: High. Training costs vary, but expect $100-$2,000 per training run, plus the ongoing cost of using the fine-tuned model.
The Decision Framework
Start here → Prompt Engineering
↓
Does it work well enough?
↓ Yes ↓ No
Done! Is the problem that the model
doesn't have the right information?
↓ Yes ↓ No
Use RAG Is the problem that the
model's style is wrong?
↓ Yes
Fine-tuneIn practice, most developers never need to leave prompt engineering. A well-written CLAUDE.md file, combined with MCP servers for accessing project-specific information, covers the vast majority of coding use cases.
RAG is the right step when you're building a tool (like an internal documentation chatbot) that needs to reference a large, changing knowledge base.
Fine-tuning is for specialized, high-volume production use cases — not typical developer tooling.
Combining Approaches
These approaches aren't mutually exclusive. The most sophisticated systems layer them:
Fine-tuned model (knows your coding conventions)
+ RAG (retrieves relevant documentation)
+ Prompt engineering (specific task instructions)But don't build this from day one. Start with prompt engineering. Add RAG when the model genuinely needs information it doesn't have. Consider fine-tuning only when you've exhausted the first two options and have a clear, data-backed justification.
Try this now
- Solve your current customization problem with prompt engineering first and see if that is enough.
- Write down whether your issue is "the model lacks information" or "the model has the wrong default style."
- Estimate the size and change rate of the knowledge base before you talk yourself into RAG.
Prompt to give your agent
"Help me choose between prompt engineering, RAG, and fine-tuning for this use case. Problem: [describe it] Knowledge base size: [small, medium, large] How often it changes: [frequency] Query volume: [approximate] Budget and complexity tolerance: [describe]
Recommend the simplest approach that is likely to work, explain why, and tell me what evidence would justify moving to the next level."
What you must review yourself
- Whether prompt engineering has genuinely failed before you add more machinery
- Whether the real problem is missing information, inconsistent style, or both
- Whether you have a credible evaluation method for "better" before spending money or building pipelines
- Whether the maintenance burden of RAG or fine-tuning is justified by the use case
Common Mistakes to Avoid
- Jumping to fine-tuning first. The most expensive option should not be the default experiment.
- Using RAG when a better prompt would work. Retrieval adds complexity that small knowledge bases may not need.
- Poor training data for fine-tuning. Bad examples become expensive bad habits.
- Not evaluating results. Improvement should be measured, not assumed.
- Over-engineering the solution. Many convention problems are solved by a solid context file.
Key takeaways
- Prompt engineering is the right first move for most teams
- RAG is about retrieving changing information, not about looking sophisticated
- Fine-tuning is justified only for narrow, repeated, high-value problems
- Escalate complexity only after you can prove the simpler option failed
What's Next
The final piece of AI infrastructure: running models locally on your own machine. We'll cover Ollama, LM Studio, and the tradeoffs between local and cloud models — when local makes sense and when it doesn't.