Jump to content

All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...


125 posts in this topic

Recommended Posts

GREAT classroom KSamples /Sec !

I dont think your flashed GPU uses reduced clockspeed. Mostly if something wrong, it "hangs" at full clockspeed and trottle didt work. Not otherweise :(

 

HINT: dont read the wrong values .

 

Example Screenshoot : running raytracing (red values) isnt already rendered (samles = 0!). Wait as long as you see at least 32 Samples and then you can see the avg KSamples /Sec (white bottom values) also ! That means time can be more than 1 Minute (on GPU only) for waiting of the result values (complete scene rendered).

Bildschirmfoto_2010_03_07_um_06.33.30.jpg

  • 2 weeks later...
  • 2 weeks later...

Hey, seems like 10.6.3 has improved speed a bit.

On the same settings as above, I get 1100 Ksamples/sec and 4050 Krays/sec on the buddha scene (GPU alone). Comparable improvements on the classroom scene. :)

Need to test the newer version.

1400 Ksamples/sec and 2900 Krays/s (luxball), GPU only in v. 1.4.5.

With the CPUs, the radeon is only at 50% load (was always near 100% in V. 1.41, which made OS X interface really sluggish).

Why is the radeon 4870 so much better than the 8800GTX at this test, but slower at the galaxy one?

1400 Ksamples/sec and 2900 Krays/s (luxball), GPU only in v. 1.4.5.

With the CPUs, the radeon is only at 50% load (was always near 100% in V. 1.41, which made OS X interface really sluggish).

Why is the radeon 4870 so much better than the 8800GTX at this test, but slower at the galaxy one?

The Galaxies comes from Apple. The smallluxGPU from free devs. I think they invested much more time in OpnCL optimisation .

All Apple OpemCL Demos are more for learning, but not for high speed.

 

Also try to run hybrid with less CPU Cores. Means try 2 Cores if you have 4 CPU Cores. If all CPU Cores working , the GPU may not be

"filled" ´fast enough with data.

In the next version i will add the feature to higher the threads (CPU) which put the data on the GPU : 2 (now) , 2,3,4 in next version.

This may produce more GPU load. But you may need some CPU load left. When you run GPU only, you can see that (now) 2 GPU threads. With new test version you will see 2,3 or 4 of them in GPU only (and also in Hybrid).

Stay tuned. I will make an test version (with that 2,3,4 GPU threads) here today or tomorrow.

Uploaded 1.4.6 at macupdate, with feature to increase the gpu only threads from 2 to 3 or 4. May put even more load on

the gpu = more samles /Sec on very fast gpus.

smalllux 1.4.6 at macupdate.com (always there is the newest)

 

My 8800GTX gets now about 15% more samples/sec with 3 gpu only threads. 4 give no more faster speed, gpu limit at 3 threads already.

Because more gpu threads put also higher system load, it isnt usable for the hybrid mode. At least not with non XEON CPUsc (8 real cores).

Also remember: For GPU Benchmark only use GPU only modes !

 

What results do you get with your ATI now in gpu only 2,3 and 4 gpu only threads ?

My results (8800GTX luxball gpu only results 2,3,4 thareds are shown in the results window and the screenshoot below)

 

I put the 1.4.6 also to rapidshare, if macupdate need more time to show the new uploaded version.

rs smalllux1.4.6

Bildschirmfoto_2010_03_21_um_21.44.16.jpg

Would be interesting !

Newest Version 1.5.2, also shows which GPUs are used at main screen and new Benchmark Mode result screen.

In the new benchmark gpu Mode 9400M gets 156 Sec, 29 Sec my 8800GTX. That times is used to render an fixed scene (luxball) without sreenoutput.

9600M GT should be at least 1,5 times faster than 9400M - but also far away from real gpu cards like 8800GT or 4870 (which will be at least 2 times faster than 8800GTX). I guess 9600M GT will need 70-90 sec in that benchmark mode.

Your sec results are welcome :)

Nvidia 8800GS, 8800GT, GTX 260, GTX 285 + ATI 4850, ATI 4890 results needed for updating the reference list.

Bildschirmfoto_2010_04_06_um_11.21.54.jpg

Im sorry to say pwned :P XFX 4870 1Gb default clocks 8.8 Seconds FTW.

BTW GTS 250 in UK are selling for about 100 pounds ATM

I bought a 4870 512 MB in november for 89 pounds at ebuyer

I bought another 4870 1GB in February for 99.99 from http://www.videocardshop.co.uk/

Anyone who wants a good card for cheap i suggest from here as there was only a few left when i bought them and they wont last long at that price Peace.

post-186801-1270690442_thumb.png

No, it is not optimized for ATI gpus.

With OpenCl insted of CUDA or ATI STEAM the OpenCL Code is compiled at runtime (for the found GPUs) by the OpenCL drivers of the users system.

slg is NOT pre compiled in the OpenCL part ! So gpu type doesnt matter. You can compile OpenCL also pre (for an special gpu type), but thats not happen in the OS X version + not for the others. You could do that, but that make no sense - then better use the ATI steam or CUDA.

The OpenCL drivers, which did the main job by compiling the stuff on the fly , beside the unit count/ shader speed of the gpu an main effekt on the speed.

  • 2 weeks later...

My results with Smallux 1.5.7

 

OpenCL GPU 0: Radeon HD 4870 750mhz

OpenCL CPU 1: Intel ® Core(™) i7 CPU 920 @2.67GHz 3000MHz

 

Real GPU = Gainward GS 4850- 512 (Core Clock:700 MHz;Memory Clock:1100 MHz)

 

 

Highend hybrid 11,1 sec

 

Highend CPU only 27,3 sec

 

Ultra CPU only 23,3 sec

 

Ultrahigh hybrid 21,4 sec

 

 

GPU modes

 

Midrange GPU 9,7 sec

 

High GPU 14,1 sec

 

Ultrahigh 29,6 sec

 

Thanks for comments

;)

  • 2 weeks later...

Thanks !

SLG updated to 1.5.8 !

all times changed (not compareable anymore to old times).

Much better hybrid vs cpu only benchmarks. CPU limits hybrid speed boost much less.

Thanks !

SLG updated to 1.5.8 !

all times changed (not compareable anymore to old times).

Much better hybrid vs cpu only benchmarks. CPU limits hybrid speed boost much less.

 

Smallux 1.5.8 with OSX 10.6.3

Luxbalscene 640x480

Corei7 920 3GHz

ATI 4850 GS Gainward 512Mb

 

Benchmark Modes CPU only and Hybrid

 

High CPU only : 62,4 s

Ultra High CPU only: 62,1s

 

Highend hybrid: 23,5s

Ultrahigh hybrid: 30,1s

 

Benchmark Modes GPU

 

Midrange GPU: 6,5s

Highend GPU: 27,3s

Ultrahigh GPU : 1,6s

:robot:

Updated to 1.6.0

Bugfix for some gpu only results.

added OpenCL Pixelfilter bench.

 

Smallux 1.6.0 (SmalluxGPU v1.5beta2)

Luxbalscene 640x480

4 Cores CPU + All GPUs

Intel Core i7 CPU 920@2.67GHz 3000MHz

Gainward ATI 4850 GS 510Mb

 

Benchmark Modes OPENCL CPU only and Hybrid

 

Midrange CPU only: 32,0s

Highend CPU only : 62,8 s

UltraHigh CPU only: 82,3s

 

Midrange hybrid: 14,3s

Highend hybrid: 22,2s

Ultrahigh hybrid: 32,2s

 

Benchmark GPU with OpenCL pixel filtering

 

Midrange GPU: 13,1s

Highend GPU: 16,7s

Ultrahigh GPU : 63,1s

;)

No, no errors

 

Every bench went smooth

:D

Thanks !

But what results with the OpenCL pixelfilter which arent times as result.

You get three xxx M Samples/sec as results, takes always 30 sec to bench for that 3 filters.

Updated to 1.6.1 : an gap line between OpenCL pixelfilter Bench and the raytraycing benches shows now better that they are different benches :D

Also i5/i7 /Quad/XEON CPUs will perform faster in Ultra High hybrid and CPU only

No other benchtime changes

Updated to 1.6.1 : an gap line between OpenCL pixelfilter Bench and the raytraycing benches shows now better that they are different benches ;)

Also i5/i7 /Quad/XEON CPUs will perform faster in Ultra High hybrid and CPU only

No other benchtime changes

 

Smallux 1.6.2 SmalluxGPU v1.5beta3dev OSX 10.6.3

Luxbalscene 640x480

4 Cores CPU + All GPUs

Intel Core i7 CPU 920@2.67GHz 3000MHz

ATI 4850 GS Gainward 510Mb

 

Benchmark Modes OPENCL CPU only and Hybrid

 

Midrange CPU only: 31,0s

Highend CPU only : 61,0 s

UltraHigh CPU only: 55,0s

 

Midrange hybrid: 8,6s

Highend hybrid: 20,1s

Ultrahigh hybrid: 31,1s

 

Midrange GPU only : 12,6s

Highend GPU only : 23,9s

Ultrahigh GPU only : 63,1s

 

Benchmark GPU with OpenCL pixel filtering

Filter none: 796,59M

Filter Preview: 183,80M

Filter Gaussian: 88,59M

Thanks very much.

Great OpenCL pixeldevice filter speed of ATI - wasnt unclear of ATI drivers already had that feature build in.

 

Also the new settings for the cpu / gpu benches are now better balanced.

UltraHigh CPU uses more cores , therefor UltraHigh CPU only works even faster than Highend cpu only, what it should at least on Quad cpu´s .

×
×
  • Create New...