engeldlgado Posted 11 hours ago Share Posted 11 hours ago Hi all, im sharing a project that might be useful to anyone running an AMD GPU on a Hackintosh. The problem: local‑LLM tooling on macOS targets Apple Silicon. On Intel Macs with discrete AMD GPUs, stock llama.cpp under Metal produces corrupted output and is painfully slow over PCIe. ToshLLM is a native SwiftUI app (pure Swift Package Manager, no external deps) that bundles llama.cpp built with AMD‑specific patches and wraps it in a real GUI: Correct Metal output on AMD dGPUs at full speed Qwen3‑8B Q4: ~101 t/s prompt / ~57 t/s generation Qwen3.6‑35B‑A3B (MoE, hybrid offload): ~123 t/s / ~18.6 t/s, up to ~25.7 t/s with MTP Native chat (Markdown, code copy, file attachments), model manager with per‑model VRAM/RAM estimates, automatic MoE CPU‑offload, MTP speculative decoding, dual engines (official + TurboQuant for 100k+ ctx), built‑in benchmarks, OpenAI‑compatible API, bilingual EN/ES New macOS 26 “Tahoe” Liquid Glass interface (degrades to translucent materials on macOS 14/15) Hardware: developed on RX 6700 XT 12 GB + [NootRX](https://github.com/ChefKissInc/NootRX); runs on any working Metal setup Its beta. DMGs aren’t notarized yet (first launch needs “Open Anyway” or `xattr -dr com.apple.quarantine`). The AMD patches live in the repo (`patches/`), so you can build from source too. License: GPL‑3.0. Repo, source and DMG releases: Link to Github Project Would love testing reports from other AMD cards (6600/6800/6900, RDNA3, Polaris/Vega). 2 Quote Link to comment https://www.insanelymac.com/forum/topic/362881-app-toshllm-%E2%80%94-local-llms-on-intel-amd-gpu-metal-amd%E2%80%91patched-llamacpp-open-source/ Share on other sites More sharing options...
mitch_de Posted 6 hours ago Share Posted 6 hours ago Hi, worked on my Hackintosh with RX 5600 XT - I5 12400F 4,8 GHZ OC DDR5 ( and Macbook Pro RX560x - but very slow) The smallest LLM gave 52.1 / 100.0 in the benchmark . close to your 6700XT? Mobile RX 560X on MacbookPro Quote Link to comment https://www.insanelymac.com/forum/topic/362881-app-toshllm-%E2%80%94-local-llms-on-intel-amd-gpu-metal-amd%E2%80%91patched-llamacpp-open-source/#findComment-2851061 Share on other sites More sharing options...
engeldlgado Posted 3 hours ago Author Share Posted 3 hours ago (edited) Nice, thanks for testing! That RX 5600 XT result is genuinely great. Just a heads-up for the comparison: you ran Qwen3-4B, while my 101/57 numbers are for the bigger Qwen3-8B — so not quite the same test. Your prompt speed basically matches my 6700 XT; generation is a bit lower because it's bandwidth-bound and the 5600 XT has less memory bandwidth (no Infinity Cache). It'll fit the 8B fine too (4.7 GB) if you want a direct apples-to-apples run — and your DDR5 + 12400F will really shine on the bigger MoE models. The MacBook's RX 560X being slow is expected — that's an old Polaris chip with very little VRAM and bandwidth, so generation falls off a cliff (the model can't really stay resident on the GPU). Prompt still looks OK because it's batched, but ~1 t/s gen is just the card showing its age. The 5600 XT is the one to use. For that 4B model i got 68 t/s | 146 t/s on the RX 6700 XT If you grab the 8B and share the numbers, I'd love to add the RX 5600 XT as a tested card. Appreciate the report! Edited 3 hours ago by engeldlgado Quote Link to comment https://www.insanelymac.com/forum/topic/362881-app-toshllm-%E2%80%94-local-llms-on-intel-amd-gpu-metal-amd%E2%80%91patched-llamacpp-open-source/#findComment-2851062 Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.