Adapt a small open model to your task using LoRA: prepare a clean instruction dataset, train lightweight adapters, and know when fine-tuning is the wrong tool entirely.
Step-by-step — built to follow along.
Adapt a small open model to your task using LoRA: prepare a clean instruction dataset, train lightweight adapters, and know when fine-tuning is the wrong tool entirely.
Step-by-step — built to follow along.
By the end of this tutorial you will understand the full workflow for fine-tuning a small open language model on your own examples using LoRA, the lightweight method that makes this feasible on modest hardware. You need a base model you have the right to train, a dataset of input-output examples in your target style or task, and a machine with a GPU (fine-tuning is far slower on CPU). Just as important, you need a clear reason to fine-tune at all, which is where we start.
Fine-tuning teaches a model a behavior, format, or style by example. It is the right tool when you need consistent structure, a specific tone, or a narrow task the base model handles clumsily. It is the wrong tool when you need the model to know new facts. For facts that change or that the model never saw, retrieval (RAG) is cheaper, faster to update, and more honest. A useful rule: fine-tune for form, retrieve for knowledge. If you only have a handful of examples or your need is one-off, a good prompt usually beats training anything.
Full fine-tuning updates every weight in the model, which is expensive and produces a complete new copy of the weights. LoRA (Low-Rank Adaptation) freezes the original weights and trains a small set of new, low-rank matrices alongside them. You end up with a tiny adapter file, often a few megabytes, that layers on top of the untouched base model. This is why LoRA runs on consumer GPUs: you train a fraction of the parameters and store almost nothing.
The dataset is where most of the outcome is decided. Each example is an instruction paired with the ideal response, stored as structured records:
{"instruction": "Summarize this support ticket in one line.", "input": "...", "output": "..."}
Three things matter more than volume. First, consistency: every output should follow the exact format and tone you want the model to learn, because the model imitates patterns, including your mistakes. Second, coverage: include the variety of inputs you will see in production, not just the easy cases. Third, cleanliness: a few hundred carefully written examples usually beat thousands of sloppy ones. Hold back a small slice as a validation set you never train on.
A LoRA run has a handful of knobs that matter:
Start with conservative defaults and change one knob at a time so you can attribute any change in results.
Kick off training and watch two numbers: training loss and validation loss. Training loss should fall steadily. Validation loss should fall too, then flatten. The moment validation loss starts climbing while training loss keeps dropping, the model is overfitting (memorizing examples instead of learning the pattern) and you should stop. Save the adapter at the point validation loss was lowest, not at the final step.
When training finishes you have an adapter, not a full model. You can either load the base model and apply the adapter at inference time, or merge the adapter into the weights to produce a standalone model:
base_model + lora_adapter -> merged_model
Loading the adapter separately keeps the base model reusable across several adapters. Merging produces a single self-contained model that is simpler to deploy. Choose based on whether you will run multiple specializations off one base.
The biggest mistake is fine-tuning to inject facts. The model will absorb the wording of your training examples but it will not reliably learn the underlying facts, and it will confidently state outdated information once the world moves on. Use retrieval for knowledge and reserve fine-tuning for form and behavior.
The second trap is overfitting from too many epochs or too little data variety. A model that scores beautifully on your training examples and falls apart on real inputs has memorized, not learned. Trust the validation set, not the training loss.
Finally, garbage examples produce a garbage model with total confidence. The model cannot tell a careless label from a careful one; it imitates whatever you give it. Invest your time in a small, clean, consistent dataset before you touch a single training knob.

Wire up retrieval-augmented generation from scratch: chunk your documents, embed them, store the vectors, and feed the right context into a local model so it answers from your data.
Adil R. · May 23, 2026 · 4 min read
Discussion