The Galaxies comes from Apple. The smallluxGPU from free devs. I think they invested much more time in OpnCL optimisation .
1400 Ksamples/sec and 2900 Krays/s (luxball), GPU only in v. 1.4.5.
With the CPUs, the radeon is only at 50% load (was always near 100% in V. 1.41, which made OS X interface really sluggish).
Why is the radeon 4870 so much better than the 8800GTX at this test, but slower at the galaxy one?
All Apple OpemCL Demos are more for learning, but not for high speed.
Also try to run hybrid with less CPU Cores. Means try 2 Cores if you have 4 CPU Cores. If all CPU Cores working , the GPU may not be
"filled" ´fast enough with data.
In the next version i will add the feature to higher the threads (CPU) which put the data on the GPU : 2 (now) , 2,3,4 in next version.
This may produce more GPU load. But you may need some CPU load left. When you run GPU only, you can see that (now) 2 GPU threads. With new test version you will see 2,3 or 4 of them in GPU only (and also in Hybrid).
Stay tuned. I will make an test version (with that 2,3,4 GPU threads) here today or tomorrow.