Jump to content
3 posts in this topic

Recommended Posts

Hi all, im sharing a project that might be useful to anyone running an AMD GPU on a Hackintosh.
 

The problem: local‑LLM tooling on macOS targets Apple Silicon. On Intel Macs with discrete AMD GPUs, stock llama.cpp under Metal produces corrupted output and is painfully slow over PCIe.

 

ToshLLM is a native SwiftUI app (pure Swift Package Manager, no external deps) that bundles llama.cpp built with AMD‑specific patches and wraps it in a real GUI:

  • Correct Metal output on AMD dGPUs at full speed
  • Qwen3‑8B Q4: ~101 t/s prompt / ~57 t/s generation
  • Qwen3.6‑35B‑A3B (MoE, hybrid offload): ~123 t/s / ~18.6 t/s, up to ~25.7 t/s with MTP
  • Native chat (Markdown, code copy, file attachments), model manager with per‑model VRAM/RAM estimates, automatic MoE CPU‑offload, MTP speculative decoding, dual engines (official + TurboQuant for 100k+ ctx), built‑in benchmarks, OpenAI‑compatible API, bilingual EN/ES
  • New macOS 26 “Tahoe” Liquid Glass interface (degrades to translucent materials on macOS 14/15)
  • Hardware: developed on RX 6700 XT 12 GB + [NootRX](https://github.com/ChefKissInc/NootRX); runs on any working Metal setup


Its beta. DMGs aren’t notarized yet (first launch needs “Open Anyway” or `xattr -dr com.apple.quarantine`). The AMD patches live in the repo (`patches/`), so you can build from source too.

License: GPL‑3.0. Repo, source and DMG releases:

 

Link to Github Project

 

Would love testing reports from other AMD cards (6600/6800/6900, RDNA3, Polaris/Vega).

SCR-20260613-lskf.png

  • Like 2

Hi,

worked on my Hackintosh with RX 5600 XT  - I5 12400F 4,8 GHZ OC DDR5 ( and Macbook Pro RX560x - but very slow)

The smallest LLM gave 52.1 / 100.0 in the benchmark . close to your 6700XT?

 

Bildschirmfoto 2026-06-14 um 09.35.11.jpg

 

Mobile RX 560X on MacbookPro

llm_560x.jpeg

Nice, thanks for testing! That RX 5600 XT result is genuinely great. Just a heads-up for the comparison: you ran Qwen3-4B, while my 101/57 numbers are for the bigger Qwen3-8B — so not quite the same test. Your prompt speed basically matches my 6700 XT; generation is a bit lower because it's bandwidth-bound and the 5600 XT has less memory bandwidth (no Infinity Cache). It'll fit the 8B fine too (4.7 GB) if you want a direct apples-to-apples run — and your DDR5 + 12400F will really shine on the bigger MoE models.


The MacBook's RX 560X being slow is expected — that's an old Polaris chip with very little VRAM and bandwidth, so generation falls off a cliff (the model can't really stay resident on the GPU). Prompt still looks OK because it's batched, but ~1 t/s gen is just the card showing its age. The 5600 XT is the one to use.

For that 4B model i got 68 t/s | 146 t/s on the RX 6700 XT

If you grab the 8B and share the numbers, I'd love to add the RX 5600 XT as a tested card. Appreciate the report!

Edited by engeldlgado

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...