Jump to content

Nvidia Fermi GTX 4xx, GTX2xx (+ others) Users for Benchmark WANTED


62 posts in this topic

Recommended Posts

EDIT :

DL link for newest slg version always at macupdate,com.

EDIT 30.07. PerFinal V171_3

http://rapidshare.com/files/410151278/smallluxGPU171_V3.zip

 

http://www.macupdate.com/info.php/id/33632/smallluxgpu

 

Needed : all NVs >= 8800

Select the luxball (standard scene) and the Benchmark GPU only modes with 2,3 and 4 gpu threads and post your kSamles/Sec in that gpu only modes.

My gpu only results (8800GTX) are shown in the screenshoot.

GTX 260++ will perform much faster. 9400M much slower.

 

 

EDIT: after i while i find that the GRASS OpenCL Demo also is an good OPENCL Bench.

i get 54 FPS with 9600GT.

Bildschirmfoto_2010_03_21_um_21.44.16.jpg

Grass_OPENCL.zip

Bildschirmfoto_2010_11_17_um_11.38.53.jpg

Running 10.6.3

 

Intriguingly this test divides the workload across the cores of the 9800 GX2, and uses both G92 chips in concert.

 

Cinebench 11.5 opengl test yields 26.17 fps in 10.6.3 and 34.32 with Win7(64).

 

Openglviewer produced lower scores in 10.6.3 then the ~3200+ fps scores with 10.6.2. It reports it is only using 16 compute units.

 

I would note opengl 3.0 was only at 65% with 10.6.2 while it's at 91% with 10.6.3.

two_threads.tiff

three_threads.tiff

four_threads.tiff

Thanks !

Can you please try the new 1.5.2 version, which shows better comparable xy Sec as Speed in the new benchmark gpu Mode ?

8800GTX needs 28 sec, 9400M 156 sec

And the 9800 GX2 needs 17.8 seconds.

 

Small matters: "title bar" and pull down menu is Deutsch; guessed to go to macupdate to download the program as you neglected to link to it here.

 

That aside, this is becoming an interesting little utility.

post-249157-1270599622_thumb.png

MSI GTX260 192 core on 10.6.3 using NVenabler.

 

with 2 threads I had 668K/sec, 3 threads 678K/sec average after 128 samples.

 

I used version 1.5.3 and "benchmark midrange CPU" resulted in 16.9 seconds, highend benchmark in 31.2 seconds.

 

Hope this helps with whatever you're doing.

Thanks !

Perhaps an GTX 285 or 2*GTX 260 user can get closer to ATI 4850 (High Benchmark 17 sec) or ATI 4870 (15 sec) ?

GTX 260 in High around 29 sec (my 8800GTX=59 sec, 9600GT=80 sec) is fastest GTX gpu until now, but far away from the units speed of the 48xx.

Also shader unit MHZ may give little speed boost some GTX 260 showed 1348 MHz, some 1408 Mhz in the benchmark mode result window!

 

Thanks for the multi GPU card 9800X2 test !

Can you perhaps use newer slg 1.5.4 (in High Benchmark Mode) - gives 2 times more sec needed (High Mode does excat double work, reason was less % overhead for OpenCL in the time which is always about 0,5-1,0 sec CPU dependent for compiling OpenCL on the fly.)

http://www.macupdate.com/info.php/id/33632/smallluxgpu

 

Would be also interesting if you perform an GPU only task with sponza scene , which is new and does huge load to gpu.

I get avg. 16 kSamples/Sec GPU only, 3 threads sponza with my 8800GTX. Your two gpus, shown in help screen, should perform at least 29 kSamples/Sec.

Let sponza scene run a while - at least until samples goes from 0 to 16 or 32 to get stable avg. result.

 

EDIT: I got Results from iMac 27" ATI 4850M : 21 sec in High Benchmark mode. Slower than 4870 (15 sec) but even faster than GTX 260.

Shaderspeed (lots of units) of ATI 48xx cant get cracked by older Geforces.

But Fermi will do - i am sure.

 

For sure, in overall gaming speed isnt so much different as in OpenCL speed !

ATI 4870 is not 4 times faster than 8800GTX running an game!

sponzagpu.jpg

"Thanks for the multi GPU card 9800X2 test !

Can you perhaps use newer slg 1.5.4 (in High Benchmark Mode) - gives 2 times more sec needed (High Mode does excat double work, reason was less % overhead for OpenCL in the time which is always about 0,5-1,0 sec CPU dependent for compiling OpenCL on the fly.)

 

Would be also interesting if you perform an GPU only task with sponza scene , which is new and does huge load to gpu.

I get avg. 16 kSamples/Sec GPU only, 3 threads sponza with my 8800GTX. Your two gpus, shown in help screen, should perform at least 29 kSamples/Sec.

Let sponza scene run a while - at least until samples goes from 0 to 16 or 32 to get stable avg. result."

 

 

 

Newer slg in High Benchmark Mode = 36.7 secs.

 

Ultrahighend Benchmark Mode = 53.7 secs.

 

Sponza scene with 48 samples, 3 threads, GPU only = 35k samples/sec.

 

(Using version 1.5.5)

Cheers.

 

Benched GTX 260 on its own before I eventually work out how to stick the second one in.

 

Midrange GPU - 16 seconds

High End GPU - 25 seconds

UltraHybrid Sponza - 22 seconds

Thanks !

Could you also compare High Hybrid vs High CPU only and Ultra Hybrid vs Ultra CPU only(both in the middle section of the screen, not the CPU only on the right - newest V 1.5.7 needed) ?

 

http://www.macupdate.com/info.php/id/33632/smallluxgpu

 

high hybrid vs high cpu only on my 8800GTX = 16 sec vs 31 sec - GPU boosts good = 100% time saving (faster cpu, same gpu = less time saving %)

ultra hybrid vs ultra cpu only = much less GPU boost ("only" 20% time saving),

because C2D CPUs are overloaded/ near full load already with the cpu tasks and cant feed the GPU fast enough with data.

So CPUs with equal/more than 4 cpu cores (real not virt) will get higher boost % also in ultra hybrid. But also will not get same big boost as with high hybrid.

  • 2 weeks later...

Ultra High GPU only was an Bug.

Now 1.6.0 available !

I added OpenCL Pixel Filter benches and cleanded up the gui.

Now all gpu only benches ware beside cpu only and hybrid and use same settings. Before the gpu only benches

had own settings compared to hybrid + cpu only.

Now its more clear and should be bugfree.

Ready to collect references again (will hold next versions).

 

Att pixelfilter Mega Samples/Sec of 8800GTX and Ultra GPU only (4870 will perform much faster, but not anymore 1,6 sec :rolleyes: )

pixelfilt.jpg

Bildschirmfoto_2010_05_07_um_11.48.01.jpg

8800GTX is much faster than 8800GT. In 8800GT vs 9800X2 the X2 would be looking better :)

9800x2 cant get near 2* 8800GTX.

Also the cpu maybe "to slow" to feed both OpenCL cpus fast enough.

Try High end CPU only vs hybrid - you may get better advantadge to my 8800GTX high end values.

 

I got also GT120 Results (MacPro 2009)

Ulta_GT120.jpg

 

Ultra GPU only 280 sec - so dont worry about 9800x2 ;)

You even can see her, that OpenCL with very fast cpus (MacPro 2009) and slow GPU is worst case - hybrid even slower than cpu only.

Overhead of OpenCL in hybrid makes slow gpus with very fast cpus (4 cores+) useless.

But most of us will NOT have scuh an combination of 2*XEON + GT120 - i hope ;)

 

PS: I also got ATI 5870 (Win) OpenCL Pixelfilter values !

 

AddSample[FILTER_NONE] Benchmark

[CypressPixel][Samples/sec 1669.42M]

 

AddSample[FILTER_PREVIEW] Benchmark

[CypressPixel][Samples/sec 369.56M]

 

AddSample[FILTER_GAUSSIAN] Benchmark

[CypressPixel][Samples/sec 217.81M]

8800GTX is much faster than 8800GT...

 

Mitch:

 

Thanks for the reply, but I guess I wasn't quite clear. It's the Open CL Pixelfilter test which produces results that appear inconsistent or anomalous. In all the other tests the 9800GX2 predictably "bests" the 8800GTX. In the Pixelfilter run the 9800GX2 only processes two thirds the information in the 30 secs that the 8800GTX does in the same time. It is as if the Pixelfilter test does not use both cores of the 9800GX2. This may be a bug?

Ah, i now understand. I will ask the benchpixel devs if that is also using all gpus.

But for sure in benchpixel the usage of the vram is much more / more often than raytraycing benches. I dont know if on older 2 gpu cards it may happen a slowdown in case of concurrented vram usage (read/write) which reduces vram overallspeed of 2gpu card vs 1 gpu card.

For an closer look start benchpixel in terminal and post the output - here we can see how may gpu devices are used. Compare the infos of devices with mine.

 

8800GTX

Device 0,1 = cpu cores

Device 2 = GPU (single 8800GTX)

 

 

mitch:~ ami$ /Users/ami/Desktop/benchpixel

LuxRays Simple PixelDevice Benchmark v0.1alpha7dev

Usage (easy mode): /Users/ami/Desktop/benchpixel

OpenCL Platform 0: Apple

Device 0 NativeThread name: NativeThread-000

Device 1 NativeThread name: NativeThread-001

Device 2 OpenCL name: GeForce 8800 GTX

Device 2 OpenCL type: GPU

Device 2 OpenCL units: 16

Device 2 OpenCL max allocable memory: 192MBytes

Device 3 OpenCL name: Intel® Core™2 Duo CPU E7300 @ 2.66GHz

Device 3 OpenCL type: CPU

Device 3 OpenCL units: 2

Device 3 OpenCL max allocable memory: 1024MBytes

Selected pixel device: GeForce 8800 GTXCreating 1 pixel device(s)

Allocating pixel device 0: GeForce 8800 GTX (Type = OPENCL)

benchpixel.zip

Ah, i now understand. I will ask the benchpixel devs if that is also using all gpus...

 

It appears the test is using both gpus and all memory. The 9800gx2 does better then the 8800gtx in every other test. May be a bug in card design with just this test, or could be a bug in the test? In WinWorld I've run many tests on the 9800gx2 while considering overclocking its bios. Watching proc temps and gpu usage I have noticed some benchmark and stress programs do not actually use both gpus, though they see both. Has this test run on other two gpu cards or multiple card setups?

 

Let me know how it goes. I am curious.

 

 

terminal_pixel.rtf

Yep. benchpixel uses both gpus.

Maybe because also uses 4 threads on cpu insted of 2 threads (Quad CPu vs C2D) it maybe an problem that cpu cant feed gpu fast enough or an L2 cache difference ! My C2D has 3 MB L2 = 1,5 MB each core.

Does your CPu has 4M or 6 MB for 4 cores (1 MB or 1,5 MB each core) ?

Because much use of RAM transfers (pic filtering!) also L2 size may be much used - the more L2 the better.

Yep. benchpixel uses both gpus.

Maybe because also uses 4 threads on cpu insted of 2 threads (Quad CPu vs C2D) it maybe an problem that cpu cant feed gpu fast enough or an L2 cache difference ! My C2D has 3 MB L2 = 1,5 MB each core.

Does your CPu has 4M or 6 MB for 4 cores (1 MB or 1,5 MB each core) ?

Because much use of RAM transfers (pic filtering!) also L2 size may be much used - the more L2 the better.

 

As you can see, each C2D of the Q has 1 MB more L2 available then your C2D.

 

post-249157-1274104206_thumb.png

 

(Disregard the bus speed indicated. CPU-X just reports what it is told. The Q6600 runs at 9x360.)

I got answer from the dev team: benchpixel filtering uses only one GPU.

SLG (the raytracing) all gpus.

So its clear that dual gpu results are lower than slg compared to single gpu card.

 

 

I got some MacPro 2009 ATI 4870 / GTX 285 results (slg 1.6.2)

GTX 285 performs better i guessed !

 

Bench UltraHigh GPU Only

Radeon HD 4870 = 54 sec

GeForce GTX 285 = 32 sec!! // GT120 = 280 sec!!!! , 8800GTX=100 sec

Bench UltraHigh Hybrid

Radeon HD 4870 = 27 sec

GeForce GTX 285 = 25 sec

 

Bench GPU with OpenCL pixel filtering

none

Radeon HD 4870 = 1072Ms/s

GeForce GTX 285 = 945Ms/s

preview

Radeon HD 4870 = 219Ms/s

GeForce GTX 285 = 298Ms/s

gaussian

Radeon HD 4870 = 96Ms/s

GeForce GTX 285 = 167Ms/s

Core i7 920 @ 2.66Ghz + GTX275

 

Ultrahigh GPU only = 36.2 sec

Highend GPU only = 17.6 sec

Midrange GPU only = 10.4 sec

 

Ultrahigh Hybrid = 29.9 sec

Highend Hybrid = 15.8 sec

Midrange Hybrid = 6.4 sec

 

Ultrahigh CPU only = 52.9 sec

Highend CPU only = 54.9 sec (?)

Midrange CPU only = 27.9 sec

 

Open CL Filtering

None = 333.30M/sec

Preview = 216.22M/sec

Gaussian = 140.13M/sec

 

Hope that's helpful at all. Let me know if there's anything else you want me to bench. ;)

Using a GTX280 with a Core i5-750 2.66ghz (2gb single channel memory.... yeah I know, I'm getting another stick soon).

 

CPU Only Midrange: 41.3sec

CPU Only Highend: 83.4sec

CPU Only Ultra: 74.4sec

 

Hybrid Midrange: 7.0sec

Hybrid Highend: 14.7sec

Hybrid Ultra: 40.4sec

 

GPU Only Midrange: 11.2sec

GPU Only Highend: 16.4sec

GPU Only Ultra: 39.7sec

 

FILTER NONE: 875.99M

FILTER PREVIEW: 272.68M

FILTER GAUSSIAN: 142.42M

 

Man.. my {censored} is all over the place.

post-269528-1275028742_thumb.jpg

Yep - GTX 280 has much benefits compared to the other Nvidias running OpenCL.

 

"Ultrahigh CPU only = 52.9 sec

Highend CPU only = 54.9 sec (?)

Midrange CPU only = 27.9 sec

"

In CPU only (and Hybrid) benches more CPU cores are used by running more threads than in Mid and Highend benches.

So on 4 core CPUs UltraHigh profits of more cpu power and may run even faster than Highend CPU only.

On C2D CPUs UltraHigh CPU runs much slower.

×
×
  • Create New...