Jump to content

[App] ToshLLM — local LLMs on Intel + AMD GPU (Metal, AMD‑patched llama.cpp, open source)


36 posts in this topic

Recommended Posts

5 minutes ago, engeldlgado said:

Thanks for the feedback.

Sure thing man.

 

6 minutes ago, engeldlgado said:

Was the AI in the middle of generating a response when this error occurred?

No I just wanted to test and see how the app handles file attachment and attached several files and the error occurred, but it was able to analyze a single somewhat short text file without any errors. I have to say that the files I've attached were pretty large files so I guess that's what cause the error.

 

8 minutes ago, engeldlgado said:

I will note it down, but keep in mind that hardware combination might simply hit its limits when benchmarking a 4B model like Qwen3.

Thanks, yeah I didn't expect much from that rig but since you've asked for a benchmark on Polaris/Vega GPUs though I share my experience. 

10 minutes ago, engeldlgado said:

Also you tested the experimental engine? maybe work better because it has a custom kernel for AMD.

I will give it a try later and keep you posted.

  • Like 1
15 hours ago, engeldlgado said:

Try to use the smallest one first to test, btw, what kind of system spec you have?

Qwen 4B

image.thumb.png.6127489d1271f978bf97ae77d3df5524.png

I have both systems in my signature. Both based on CoffeeLake CPU's. RX 560 and 580.

 

  • Like 1
17 minutes ago, XanthraX said:

I have both systems in my signature. Both based on CoffeeLake CPU's. RX 560 and 580.

 

 

Sorry, i didnt notice because i was on the phone when i replied to you..

 

Your RX-580 is GCN/Polaris, not RDNA+...

My AMD decode kernel is only instantiated for RDNA+ (RX-5000/6000 series) maybe others but needs further testing, so ToshLLM won't work atm on your RX-580 and 560

 

But I'm going to study integrating a GCN/Polaris-compatible patch. I'll need to rewrite the kernel to use 64-lane SIMD groups instead of RDNA's 32-lane simdgroups, which is more complex, but I'm interested in exploring it.

 

I'll also study llama-metal old repo that i saw searching for this issue... to see if I can port it, to my patch to it and optimize it better for GCN GPUs.

 

I'll update if I make progress on GCN support... would you be willing to test it when I get a working solution?

Edited by engeldlgado
  • Like 1

Hi @Cyberdevs

Quick update, that i've work today:

Update (v0.81.25): you can now attach files in chat — including PDFs (text is extracted automatically, and scanned PDFs are read with on-device OCR), plus more text formats. And image input for vision models is in: drop in a vision model with its mmproj (e.g. gemma-3-4b) and you can attach an image and ask about it. Vision is experimental and the image encoder runs partly on CPU on AMD GPUs (some Metal ops aren't supported), so it works but isn't fully GPU-accelerated yet. DMG is building now.

 

Also i've add a option to change the default location for models

image.thumb.png.7c891159ecdf92a80033fe3b3d6b6162.png


Also may ask you for a new test on the RX Card... Update the app, and just load a model and start the server, no benchmark, anyting, just send me the logs, im researching about the VEGA/GCN Cards...

Edited by engeldlgado
  • Like 2
1 hour ago, Alpha22 said:

Settings?

 
Yeah in settings... theres is an option to change the Inference Engine (llama.ccp) bundle, the experimental one, has better improvements against the normal one

image.thumb.png.20b24a1d47c3335ba38ebe6be2975f44.png

1 hour ago, engeldlgado said:

 
Yeah in settings... theres is an option to change the Inference Engine (llama.ccp) bundle, the experimental one, has better improvements against the normal one

image.thumb.png.20b24a1d47c3335ba38ebe6be2975f44.png

Screenshot-2026-06-18-alle-19-47-44.png

On my AMD RX6800XT

Two top benchmarks are after enabling these settings in version Version 0.81.26 (0.81.26):

2 hours ago, engeldlgado said:

Yeah in settings... theres is an option to change the Inference Engine (llama.ccp) bundle, the experimental one, has better improvements against the normal one

01.png

 

I'll test my RX580 later and post the results.

  • Like 1

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Unfortunately, your content contains terms that we do not allow. Please edit your content to remove the highlighted words below.
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...