A clean, ordered path to a working GPU stack for running models locally, plus the version-mismatch traps that quietly waste an afternoon.
Step-by-step — built to follow along.
Most "GPU not detected" errors are not hardware faults. They are a version mismatch between three layers that have to agree: the kernel driver, the compute toolkit, and the framework build you installed with pip. Get those three to line up once and the rest of local AI work stops fighting you. This walks the install in the order that avoids the usual breakage.
Before installing anything, see the current state. A half-installed older driver is the most common source of conflicts.
# Vendor GPU query tool reports driver + detected devices
nvidia-smi
# or, for other vendors, the equivalent vendor query utility
If the command is missing, you have no working driver yet. If it prints a driver version, write it down.
Use the vendor package channel or your distribution's repository, not a random binary. On Linux, prefer the packaged driver so kernel updates do not silently break it. Reboot after installing so the kernel module actually loads.
# Re-run the query after reboot. You want a version and a listed device.
nvidia-smi
If the device still does not appear, stop here. A toolkit installed on top of a broken driver will not help.
The driver is backward compatible with older toolkits, but your framework build is not flexible. Install the toolkit version the framework was compiled against, not the newest one available.
# Confirm the toolkit compiler is on PATH
nvcc --version
Frameworks ship separate builds per compute version. Pick the one whose tag matches Step 3. Installing the default CPU wheel by accident is a frequent and silent mistake.
# Verify the framework actually sees the GPU
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
A True and your device name means the full stack agrees. That is the goal.
Allocate a tensor on the device and do one operation. If this runs without an out-of-memory or driver error, you are done.
python -c "import torch; x = torch.randn(1000, 1000, device='cuda'); print((x @ x).sum().item())"
is_available() returning false. Match versions deliberately.rm -rf away, not a full OS reinstall.nvcc is missing but installed, the toolkit's bin directory is not on your PATH. Add it and reopen the shell.Once the three layers agree, treat that combination as load bearing. Pin the versions in your environment file so the next machine reproduces it instead of rediscovering the same afternoon of debugging. Write down the exact driver, toolkit, and framework versions that worked, because the next time something breaks, the first useful question is always whether one of them moved. A reproducible record turns a vague guessing session into a quick diff against a known-good setup.

Stop guessing from vibes. A repeatable way to decide if a model clears the bar for your specific job, using your own data.
BitByteCore Research · May 19, 2026 · 3 min read
Discussion