# Run uncensored LLMs locally — the access nobody can revoke > When a government can order an AI cut off overnight, the answer is open weights on your own hardware. Pick a runtime, pick a model, quantize it to fit, and run it offline — no account, no logs, no kill switch. Markdown twin of https://xmr.club/guides/local-uncensored-llms. CC-BY-4.0. Attribute "xmr.club". ## At a glance - Canonical: https://xmr.club/guides/local-uncensored-llms - Slug: local-uncensored-llms - Title: Run uncensored LLMs locally — the access nobody can revoke - Description: When a government can order an AI cut off overnight, the answer is open weights on your own hardware. Pick a runtime, pick a model, quantize it to fit, and run it offline — no account, no logs, no kill switch. - Available locales: en, zh, es, ru - zh: https://xmr.club/zh/guides/local-uncensored-llms - es: https://xmr.club/es/guides/local-uncensored-llms - ru: https://xmr.club/ru/guides/local-uncensored-llms ## Intro On 12 June 2026 a single export-control directive forced Anthropic to suspend Fable 5 and Mythos 5 for every foreign national, overnight. Hosted intelligence is a permission — and permissions get revoked. A model whose weights live on your own disk has none of that fragility. Here is how to run one. ## Body ## Why local, why now On 12 June 2026 a US export-control directive forced Anthropic to suspend access to Fable 5 and Mythos 5 for every foreign national — overnight, no wrongdoing required. A policy change upstream, and hundreds of millions of people lost a tool they relied on. That is the structural risk of renting intelligence from a gatekeeper: access is a permission, and permissions get revoked, geofenced, repriced, or logged. A model whose weights live on your own disk has none of that fragility. It can't be cut off by a directive you never saw, throttled, or quietly fine-tuned against you. Open-weight models are to AI what running your own node is to Bitcoin: clunkier than the hosted option, and yours in a way the hosted option can never be. "Uncensored" here means two things: weights you can run with no API gate, and fine-tunes that don't refuse benign requests. Both matter — but neither makes a model smarter or more truthful. Treat the outputs like any tool's: useful, fallible, and your responsibility. ## The hardware reality (and the quantization cheat) The one number that matters is memory — VRAM if you have a GPU, system RAM if you don't. The trick that makes local models practical is **quantization**: compressing weights from 16-bit down to 4-bit with little quality loss. A rough rule for a 4-bit (Q4_K_M) GGUF model: - **7–8B parameters:** ~5 GB. Runs on a laptop, even CPU-only (slowly). 8 GB VRAM is comfortable. - **13–14B:** ~9 GB. A 12 GB GPU or a 16 GB Mac. - **30–34B:** ~20 GB. A 24 GB GPU (3090/4090) or a 32 GB Mac. - **70B:** ~42 GB. Two 24 GB GPUs, a 48 GB card, or a 64 GB+ Mac. Apple Silicon punches above its weight because the GPU shares system RAM — a 64 GB Mac runs models a comparable PC needs two graphics cards for. No GPU at all? A 7B still runs on CPU; expect a few tokens per second, not dozens. ## Pick a runtime - **Ollama** — the easiest start. One install, then `ollama run llama3.1` pulls and runs a model. Exposes a local API on port 11434 that most chat UIs speak. Recommended for almost everyone. - **LM Studio** — a polished desktop GUI. Browse and download models from Hugging Face, chat, and expose an OpenAI-compatible local server. Best if you want zero terminal. - **llama.cpp** — the bare-metal engine under most of the others. Maximum control and the widest hardware support; you build it and manage GGUF files yourself. - **vLLM / TGI** — for serving one model to many requests at speed on a real GPU. Overkill for one person; right for a shared box. ## Pick a model Start with a strong open-weight base, then choose a fine-tune if you want fewer refusals: - **Open-weight bases:** Llama (Meta), Qwen (Alibaba), Mistral / Mixtral, Gemma (Google), DeepSeek. All downloadable, all run locally. Qwen and Llama 8B are the best "fits on a laptop" all-rounders today. - **Uncensored / "abliterated" fine-tunes:** the Dolphin series, Nous Hermes, and "abliterated" builds (a technique that surgically removes the refusal direction from an existing model). They answer instead of lecturing — useful for security research, fiction, and edge-case questions a hosted model nannies. The cost: they hallucinate at least as much, sometimes more, with no guardrail between you and a confidently wrong answer. Get weights from Hugging Face. With Ollama, most popular uncensored builds are one command away (e.g. `ollama run dolphin-mistral`). ## Quickstart with Ollama `// install (macOS/Linux) curl -fsSL https://ollama.com/install.sh | sh // pull + chat with an 8B all-rounder ollama run llama3.1:8b // an uncensored fine-tune instead ollama run dolphin-mistral // list what you have, free disk later ollama list ollama rm dolphin-mistral` Point any OpenAI-compatible client at `http://localhost:11434/v1` and you have a private, local drop-in. For a chat UI, Open WebUI runs in one container and talks to Ollama out of the box. ## Make it truly private - **Pull weights, then go offline.** Once a model is on disk it needs no network at all. Download over Tor or a VPN if you'd rather Hugging Face / the registry not log your IP against a model list. - **Block the runtime from phoning home.** Ollama and llama.cpp run fully local, but firewall the process anyway (or run it on an air-gapped box) so a future update can't add telemetry behind your back. - **Keep prompts on-device.** The whole point: your conversations never leave the machine. No account, no server-side history, nothing to subpoena. - **Disk encryption matters more now.** Your prompt history and any saved chats live locally — full-disk encryption (see our device guides) is the backstop if the hardware is seized or lost. ## Honest caveats - **Uncensored is not smarter.** Removing refusals doesn't add knowledge or accuracy. An abliterated 8B is still an 8B. - **Local is not frontier.** A 70B on your desk is genuinely useful, but it won't match the best hosted models on the hardest tasks. The trade you're making is capability for sovereignty — go in clear-eyed. - **You own the output.** No provider is filtering on your behalf, which is the point — and the responsibility. What you generate and do with it is on you. ## See also The sovereignty logic here is the same one behind every listing on this site: tools you control beat tools you rent. See our [OPSEC52 series](/opsec) for the broader threat-model work, and [/vpns](/vpns) if you want to download weights without your ISP building a profile. ## How to cite Source: xmr.club, "Run uncensored LLMs locally — the access nobody can revoke". https://xmr.club/guides/local-uncensored-llms (CC-BY-4.0). ## Related - https://xmr.club/guides — full guides index (41 guides) - https://xmr.club/methodology — how the directory grades providers referenced in this guide - https://xmr.club/transparency — funding model + editorial firewall - https://xmr.club/data.json — full provider dataset (CC-BY-4.0) ## License CC-BY-4.0. Attribute "xmr.club".