Set up GPU drivers and toolkit for local AI work

Adil R.May 20, 20263 min

A clean, ordered path to a working GPU stack for running models locally, plus the version-mismatch traps that quietly waste an afternoon.

Step-by-step — built to follow along.

Signalforming

Discussion

Loading…

Prerequisites#

A discrete GPU the vendor still supports, with enough VRAM for your target model (check the model card before you start).

Admin rights on the machine, and a terminal you are comfortable in.

A clean Python environment manager (venv or conda). Never install ML packages into the system Python.

A note of which framework you need (for example a PyTorch build) and the compute version it was compiled against. This single fact prevents most failures.

Step 1: Check what is already there#

Before installing anything, see the current state. A half-installed older driver is the most common source of conflicts.

# Vendor GPU query tool reports driver + detected devices
nvidia-smi
# or, for other vendors, the equivalent vendor query utility

If the command is missing, you have no working driver yet. If it prints a driver version, write it down.

Step 2: Install or update the kernel driver#

Use the vendor package channel or your distribution's repository, not a random binary. On Linux, prefer the packaged driver so kernel updates do not silently break it. Reboot after installing so the kernel module actually loads.

# Re-run the query after reboot. You want a version and a listed device.
nvidia-smi

If the device still does not appear, stop here. A toolkit installed on top of a broken driver will not help.

Step 4: Install the framework build that matches#

Frameworks ship separate builds per compute version. Pick the one whose tag matches Step 3. Installing the default CPU wheel by accident is a frequent and silent mistake.

# Verify the framework actually sees the GPU
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"

A True and your device name means the full stack agrees. That is the goal.

Pitfalls#

Newest is not safest. Installing the latest toolkit when your framework expects an older one is the number one cause of is_available() returning false. Match versions deliberately.

Mixed install sources. A driver from one channel plus a toolkit from another often disagree. Pick one source and stay on it.

Skipping the reboot. The kernel module loads at boot. A driver that "installed fine" but is not detected usually just needs a restart.

Polluting the base environment. Always install into a fresh virtual environment so a broken attempt is a rm -rf away, not a full OS reinstall.

Ignoring VRAM limits. A correct stack still fails with out-of-memory if the model does not fit. Check the model's memory footprint before blaming the install.

Forgetting PATH. If nvcc is missing but installed, the toolkit's bin directory is not on your PATH. Add it and reopen the shell.

Once the three layers agree, treat that combination as load bearing. Pin the versions in your environment file so the next machine reproduces it instead of rediscovering the same afternoon of debugging. Write down the exact driver, toolkit, and framework versions that worked, because the next time something breaks, the first useful question is always whether one of them moved. A reproducible record turns a vague guessing session into a quick diff against a known-good setup.

Set up GPU drivers and toolkit for local AI work

More in software

Serve a local model as an API endpoint

Discussion

Prerequisites#

Step 1: Check what is already there#

Step 2: Install or update the kernel driver#

Step 3: Install the compute toolkit at the version your framework expects#

Step 4: Install the framework build that matches#

Step 5: Run a tiny real workload#

Pitfalls#

Evaluate whether a model is good enough for your task

How to run a local LLM on your own machine with Ollama