# Run uncensored LLMs locally — the access nobody can revoke

> When a government can order an AI cut off overnight, the answer is open weights on your own hardware. Pick a runtime, pick a model, quantize it to fit, and run it offline — no account, no logs, no kill switch.

Markdown twin of https://xmr.club/guides/local-uncensored-llms. CC-BY-4.0. Attribute "xmr.club".

## At a glance

- Canonical: https://xmr.club/guides/local-uncensored-llms
- Slug: local-uncensored-llms
- Title: Run uncensored LLMs locally — the access nobody can revoke
- Description: When a government can order an AI cut off overnight, the answer is open weights on your own hardware. Pick a runtime, pick a model, quantize it to fit, and run it offline — no account, no logs, no kill switch.
- Available locales: en, zh, es, ru
  - zh: https://xmr.club/zh/guides/local-uncensored-llms
  - es: https://xmr.club/es/guides/local-uncensored-llms
  - ru: https://xmr.club/ru/guides/local-uncensored-llms

## Intro

On 12 June 2026 a single export-control directive forced Anthropic to suspend Fable 5 and Mythos 5 for every foreign national, overnight. Hosted intelligence is a permission — and permissions get revoked. A model whose weights live on your own disk has none of that fragility. Here is how to run one.

## Body

## Why local, why now

On 12 June 2026 a US export-control directive forced Anthropic to suspend access to Fable 5 and Mythos 5 for every foreign national — overnight, no wrongdoing required. A policy change upstream, and hundreds of millions of people lost a tool they relied on. That is the structural risk of renting intelligence from a gatekeeper: access is a permission, and permissions get revoked, geofenced, repriced, or logged.

A model whose weights live on your own disk has none of that fragility. It can't be cut off by a directive you never saw, throttled, or quietly fine-tuned against you. Open-weight models are to AI what running your own node is to Bitcoin: clunkier than the hosted option, and yours in a way the hosted option can never be.

"Uncensored" here means two things: weights you can run with no API gate, and fine-tunes that don't refuse benign requests. Both matter — but neither makes a model smarter or more truthful. Treat the outputs like any tool's: useful, fallible, and your responsibility.

## The hardware reality (and the quantization cheat)

The one number that matters is memory — VRAM if you have a GPU, system RAM if you don't. The trick that makes local models practical is **quantization**: compressing weights from 16-bit down to 4-bit with little quality loss. A rough rule for a 4-bit (Q4_K_M) GGUF model:

- **7–8B parameters:** ~5 GB. Runs on a laptop, even CPU-only (slowly). 8 GB VRAM is comfortable.

- **13–14B:** ~9 GB. A 12 GB GPU or a 16 GB Mac.

- **30–34B:** ~20 GB. A 24 GB GPU (3090/4090) or a 32 GB Mac.

- **70B:** ~42 GB. Two 24 GB GPUs, a 48 GB card, or a 64 GB+ Mac.

Apple Silicon punches above its weight because the GPU shares system RAM — a 64 GB Mac runs models a comparable PC needs two graphics cards for. No GPU at all? A 7B still runs on CPU; expect a few tokens per second, not dozens.

## Pick a runtime

- **Ollama** — the easiest start. One install, then `ollama run llama3.1` pulls and runs a model. Exposes a local API on port 11434 that most chat UIs speak. Recommended for almost everyone.

- **LM Studio** — a polished desktop GUI. Browse and download models from Hugging Face, chat, and expose an OpenAI-compatible local server. Best if you want zero terminal.

- **llama.cpp** — the bare-metal engine under most of the others. Maximum control and the widest hardware support; you build it and manage GGUF files yourself.

- **vLLM / TGI** — for serving one model to many requests at speed on a real GPU. Overkill for one person; right for a shared box.

## Pick a model

Start with a strong open-weight base, then choose a fine-tune if you want fewer refusals:

- **Open-weight bases:** Llama (Meta), Qwen (Alibaba), Mistral / Mixtral, Gemma (Google), DeepSeek. All downloadable, all run locally. Qwen and Llama 8B are the best "fits on a laptop" all-rounders today.

- **Uncensored / "abliterated" fine-tunes:** the Dolphin series, Nous Hermes, and "abliterated" builds (a technique that surgically removes the refusal direction from an existing model). They answer instead of lecturing — useful for security research, fiction, and edge-case questions a hosted model nannies. The cost: they hallucinate at least as much, sometimes more, with no guardrail between you and a confidently wrong answer.

Get weights from Hugging Face. With Ollama, most popular uncensored builds are one command away (e.g. `ollama run dolphin-mistral`).

## Quickstart with Ollama

`// install (macOS/Linux)
curl -fsSL https://ollama.com/install.sh | sh

// pull + chat with an 8B all-rounder
ollama run llama3.1:8b

// an uncensored fine-tune instead
ollama run dolphin-mistral

// list what you have, free disk later
ollama list
ollama rm dolphin-mistral`
Point any OpenAI-compatible client at `http://localhost:11434/v1` and you have a private, local drop-in. For a chat UI, Open WebUI runs in one container and talks to Ollama out of the box.

## Make it truly private

- **Pull weights, then go offline.** Once a model is on disk it needs no network at all. Download over Tor or a VPN if you'd rather Hugging Face / the registry not log your IP against a model list.

- **Block the runtime from phoning home.** Ollama and llama.cpp run fully local, but firewall the process anyway (or run it on an air-gapped box) so a future update can't add telemetry behind your back.

- **Keep prompts on-device.** The whole point: your conversations never leave the machine. No account, no server-side history, nothing to subpoena.

- **Disk encryption matters more now.** Your prompt history and any saved chats live locally — full-disk encryption (see our device guides) is the backstop if the hardware is seized or lost.

## Honest caveats

- **Uncensored is not smarter.** Removing refusals doesn't add knowledge or accuracy. An abliterated 8B is still an 8B.

- **Local is not frontier.** A 70B on your desk is genuinely useful, but it won't match the best hosted models on the hardest tasks. The trade you're making is capability for sovereignty — go in clear-eyed.

- **You own the output.** No provider is filtering on your behalf, which is the point — and the responsibility. What you generate and do with it is on you.

## See also

The sovereignty logic here is the same one behind every listing on this site: tools you control beat tools you rent. See our [OPSEC52 series](/opsec) for the broader threat-model work, and [/vpns](/vpns) if you want to download weights without your ISP building a profile.

## How to cite

Source: xmr.club, "Run uncensored LLMs locally — the access nobody can revoke". https://xmr.club/guides/local-uncensored-llms (CC-BY-4.0).

## Related

- https://xmr.club/guides — full guides index (41 guides)
- https://xmr.club/methodology — how the directory grades providers referenced in this guide
- https://xmr.club/transparency — funding model + editorial firewall
- https://xmr.club/data.json — full provider dataset (CC-BY-4.0)

## License

CC-BY-4.0. Attribute "xmr.club".