Fine-Tuning vs RAG vs Prompt Engineering

The AI model out of the box is good at general tasks. But you want it to be good at your tasks — writing code in your style, following your conventions, referencing your documentation, and understanding your domain.

There are three approaches to customizing AI behavior, and they sit on a spectrum of complexity:

Simple ──────────────────────────────────── Complex
Prompt Engineering → RAG → Fine-Tuning
Low cost                              High cost
Minutes to set up              Weeks to set up
Most flexible                  Most specialized

Most developers should start on the left and only move right when they hit a genuine limitation. Let's understand what each approach does and when it's the right choice.

Prompt Engineering — The 80% Solution

What it is: Crafting your instructions (prompts) to guide the model's behavior. This includes system prompts, example-based instructions, and structured templates.

How it works: You don't change the model at all. You change what you tell the model. A well-crafted prompt can make a general-purpose model behave like a domain specialist.

# System Prompt Example
 
You are a TypeScript developer working on a Next.js application.
 
## Code Style
- Use functional components with hooks
- Always use TypeScript strict mode
- Prefer named exports over default exports
- Use Zod for all input validation
 
## Response Format
- Explain your approach in 1-2 sentences
- Show the code
- Note any assumptions you made
 
## Do NOT
- Use the "any" type
- Use default exports
- Write classes (use functions and hooks instead)

Strengths:

Instant to implement — change the prompt, get different behavior
No technical infrastructure required
Easy to iterate — try a prompt, see the result, adjust
Works with any model from any provider
CLAUDE.md files, Cursor rules, and system prompts are all prompt engineering

Weaknesses:

Uses context window space (every instruction is tokens)
The model might not follow complex instructions perfectly
Can't teach the model genuinely new knowledge (only direct its existing knowledge)

When to use it: Always try prompt engineering first. For most coding tasks, a well-written CLAUDE.md file and clear prompts are sufficient. Move to RAG or fine-tuning only when prompt engineering hits a wall.

Real cost: Free (part of your normal API usage).

RAG — Retrieval-Augmented Generation

What it is: Instead of putting all your context in the prompt upfront, RAG dynamically retrieves relevant information and injects it into the prompt at query time.

How it works:

1. You have a knowledge base (docs, code, wiki, etc.)
2. The knowledge base is indexed (chunked and embedded)
3. When a question comes in, the system searches the index
   for the most relevant chunks
4. Those chunks are inserted into the prompt as context
5. The model generates a response using the retrieved context

Think of it as giving the model a search engine. Instead of the model relying only on its training data, it can look up specific, current information from your sources.

Without RAG:
User: "How do I configure authentication in our app?"
Model: [Gives generic auth advice based on training data]
 
With RAG:
User: "How do I configure authentication in our app?"
System: [Searches your docs → finds auth setup guide]
Model: [Gives specific advice based on YOUR auth documentation]
  "In your project, authentication is handled by Clerk.
   Configure it in src/middleware.ts with the clerkMiddleware()
   function. See the setup guide at docs/auth-setup.md..."

Strengths:

The model can access knowledge it wasn't trained on (your internal docs, recent changes)
Knowledge stays current — update the source documents, and the model's answers update
Scales to large knowledge bases (thousands of documents)
No model training required — you keep using the same model

Weaknesses:

Requires infrastructure (vector database, embedding pipeline, retrieval system)
Retrieved context quality varies — bad retrieval = bad answers
Adds latency (search step before generation)
Chunks of context may lose important surrounding information

When to use it:

Your team has significant internal documentation that the model doesn't know about
You need the AI to reference current, project-specific information
You're building a tool that answers questions about your codebase or documentation
The knowledge base changes frequently (prompt engineering can't keep up)

Real cost: Low-to-medium. Vector databases (Pinecone, Weaviate, pgvector) have costs, and embedding documents requires API calls. For a small-to-medium knowledge base, expect $20-$100/month.

In practice for coding: MCP servers and Claude Code's file reading are a form of RAG — the agent retrieves relevant files and includes them as context. The difference is that formal RAG systems use semantic search (meaning-based) rather than file-path-based retrieval.

Fine-Tuning — The Custom Model

What it is: Training a model on your specific data to change its default behavior. The model itself is modified to produce different outputs.

How it works:

1. Collect training examples (hundreds to thousands)
   Input: "Write a database query for user lookup"
   Output: [Code that follows YOUR specific patterns]
 
2. Train the model on these examples
   (The provider runs the training on their infrastructure)
 
3. Deploy the fine-tuned model
   (New model endpoint that reflects your training data)

After fine-tuning, the model inherently produces output matching your style and patterns, without needing instructions in the prompt.

Strengths:

Changes the model's default behavior (no prompt instructions needed)
Can encode specific coding styles, patterns, and domain knowledge
Reduces prompt size (conventions are baked in, not explained each time)
Consistent output quality for the trained domain

Weaknesses:

Expensive — training costs hundreds to thousands of dollars
Slow — training takes hours to days
Requires high-quality training data (garbage in, garbage out)
Can degrade general capabilities (the model gets better at your specific task but may get worse at others)
Only available for certain models and providers
Needs retraining when your conventions change

When to use it:

You have thousands of examples of the exact output you want
Prompt engineering can't capture the nuance of your coding style
You're making high-volume API calls and want to reduce per-call token costs (shorter prompts)
You need the model to know domain-specific terminology or patterns that general training doesn't cover

Real cost: High. Training costs vary, but expect $100-$2,000 per training run, plus the ongoing cost of using the fine-tuned model.

The Decision Framework

Start here → Prompt Engineering
                  ↓
        Does it work well enough?
            ↓ Yes       ↓ No
          Done!    Is the problem that the model
                   doesn't have the right information?
                       ↓ Yes           ↓ No
                     Use RAG     Is the problem that the
                                 model's style is wrong?
                                     ↓ Yes
                                  Fine-tune

In practice, most developers never need to leave prompt engineering. A well-written CLAUDE.md file, combined with MCP servers for accessing project-specific information, covers the vast majority of coding use cases.

RAG is the right step when you're building a tool (like an internal documentation chatbot) that needs to reference a large, changing knowledge base.

Fine-tuning is for specialized, high-volume production use cases — not typical developer tooling.

Combining Approaches

These approaches aren't mutually exclusive. The most sophisticated systems layer them:

Fine-tuned model (knows your coding conventions)
  + RAG (retrieves relevant documentation)
    + Prompt engineering (specific task instructions)

But don't build this from day one. Start with prompt engineering. Add RAG when the model genuinely needs information it doesn't have. Consider fine-tuning only when you've exhausted the first two options and have a clear, data-backed justification.

Try this now

Solve your current customization problem with prompt engineering first and see if that is enough.
Write down whether your issue is "the model lacks information" or "the model has the wrong default style."
Estimate the size and change rate of the knowledge base before you talk yourself into RAG.

Prompt to give your agent

"Help me choose between prompt engineering, RAG, and fine-tuning for this use case. Problem: [describe it] Knowledge base size: [small, medium, large] How often it changes: [frequency] Query volume: [approximate] Budget and complexity tolerance: [describe]

Recommend the simplest approach that is likely to work, explain why, and tell me what evidence would justify moving to the next level."

What you must review yourself

Whether prompt engineering has genuinely failed before you add more machinery
Whether the real problem is missing information, inconsistent style, or both
Whether you have a credible evaluation method for "better" before spending money or building pipelines
Whether the maintenance burden of RAG or fine-tuning is justified by the use case

Common Mistakes to Avoid

Jumping to fine-tuning first. The most expensive option should not be the default experiment.
Using RAG when a better prompt would work. Retrieval adds complexity that small knowledge bases may not need.
Poor training data for fine-tuning. Bad examples become expensive bad habits.
Not evaluating results. Improvement should be measured, not assumed.
Over-engineering the solution. Many convention problems are solved by a solid context file.

Key takeaways

Prompt engineering is the right first move for most teams
RAG is about retrieving changing information, not about looking sophisticated
Fine-tuning is justified only for narrow, repeated, high-value problems
Escalate complexity only after you can prove the simpler option failed

What's Next

The final piece of AI infrastructure: running models locally on your own machine. We'll cover Ollama, LM Studio, and the tradeoffs between local and cloud models — when local makes sense and when it doesn't.

Use the lesson prompt before you improvise

Prompt Engineering — The 80% Solution

RAG — Retrieval-Augmented Generation

Fine-Tuning — The Custom Model

The Decision Framework

Combining Approaches

Try this now

Prompt to give your agent

What you must review yourself

Common Mistakes to Avoid

Key takeaways

What's Next