Jump to content

All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...


  • Please log in to reply
124 replies to this topic

#61
rushko

rushko

    InsanelyMac Protégé

  • Members
  • Pip
  • 9 posts
Hi Mitch_de,

I am tryin to start mandelbrot and nbody tests (CUDA_Nvidia_GPU_Benches) but I keep getting error :[codebox]Rushkos-iMac:~ rushko$ cd /Users/rushko/Downloads/Benchmarks/CUDA_Nvidia_GPU_Benches/Mandelbrot_
Rushkos-iMac:Mandelbrot_ rushko$ /Users/rushko/Downloads/Benchmarks/CUDA_Nvidia_GPU_Benches/Mandelbrot_/Mandelbrot
dyld: Library not loaded: @rpath/libcudart.dylib
Referenced from: /Users/rushko/Downloads/Benchmarks/CUDA_Nvidia_GPU_Benches/Mandelbrot_/Mandelbrot
Reason: image not found
Trace/BPT trap
Rushkos-iMac:Mandelbrot_ rushko$
[/codebox]

What I'm doin wrong? Thanks!

BTW new bench results for Gigabyte 250GTS OC:

galaxies 32k: 299Gflops.

Grass: 64-69fps

OpenCL Transpose Bandwidth: 44.811326 GB/sec

OpenCL QJulia and qjulia1024 are showing variable results

OpenCL aobench: CPU - 1.79fps max, GPU 12fps max

displacement won't work

smallluxGPU:

GPU: 3000-3100 rays/sec
CPU+GPU 3900+ rays/sec

#62
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
Thanks sharing your results !
Big OpenCL advantage will have GTX275+ and new FERMI , updated 5XXXX ATI in near future.
GTX 285 will have at least tripple performance of 8800 GT / near twice of GT250.
FERMI has really much more focus on gpu computing - much more than ours and also ATI 58xx.
But even an 9400M may boost such things like rendering with an 15-20% boost for an C2D cpu - saves one MHz cpu version step .



The mandelbort is an CUDA app (does same as OpenCL but NV special). You must have installed the CUDA drivers.
But OpenCL will be more interesting in near future. Main PRO for CUDA is, that it can be used also with 10.5, whereas OpenCL is 10.6+ only.

There is an new smalllux Version out (with an dragon scene), i updated first posting. Results with same szene will be the same at least for 8800GT /250GT.

#63
rushko

rushko

    InsanelyMac Protégé

  • Members
  • Pip
  • 9 posts
Hi,

That's good news Mitch, I just wonder when we can expect OpenCL accelerated apps and since the modern GPUs have so much power, how it will affect the CPUs...

Regarding mandelbrot, I've installed CUDA drivers 3.0.1 beta1 before I've tried to start it, so it's not causing the problem I think. Is there a way to test that CUDA is installed and working?

#64
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany

Hi,

That's good news Mitch, I just wonder when we can expect OpenCL accelerated apps and since the modern GPUs have so much power, how it will affect the CPUs...

Regarding mandelbrot, I've installed CUDA drivers 3.0.1 beta1 before I've tried to start it, so it's not causing the problem I think. Is there a way to test that CUDA is installed and working?


Remember that CUDA, even newest DOESNT support 64 Bit Kernel Mode of Snow Leo !

4 things may be the key how fast / why slower OpenCL get used in Mac OS X Apps :
1. HW: The count of OpenCL-GPU ready Macs is, beside 9400M(can do OpenCL but miniboost :thumbsup_anim: ) much less than on PC site
2. SW: Lots of Mac Users are stuck with 10.5 (PPC+ some Intel also) - no OpenCL
3. HW: Only 8800GT and GTX285 Users of real Macs have real benefit. ATI 4850/70 OpenCL driver needs much bugfixes or OpenCL Code must be much ATI friendly=much work
4: SW: OpenCL coding background (= code must be parallelised) new for most OS X devs - on Win / Linus site devs learned already better by using CUDA
So maybe Apple itself will open the OpenCL boosted Apps like iLife (iDVD, iMovie) and does beside CoreImage some tasks with OpenCL.
Other OS X companies will for sure learn+implement OpenCL in beta tests, but i think that will use time at least until summer 2010 (10.6.4+).
The most interest for that GPU programming comes from universitys, which have much gpu coding knowledge + less money (for "CRAYs", big clusters)
Very interesting is FASTAII HighendPC from an univertity of belgium used as medizine task number cruncher .
They build an special GPU compute server with 6 GTX 295(=2*275 each) !!! = 12 GPUs for CUDA/OpenCL for less than 8000 US$ - much faster than an >20 times expensiver linux cluster. Also much, much less power consumtion + cooling needs that such linux clusters - FASTRAII uses up to 1200Watt - much for us - peanuts against an bigger+much slower+much ++US$ linux cluster !
Its CUDA but thats really near same coding as for OpenCL.

FASTRAII !!!!

Posted Image
Posted Image

#65
olaszvandor

olaszvandor

    InsanelyMac Protégé

  • Members
  • Pip
  • 10 posts

W3520 overclocked to 4.1Ghz Turbo, PCIE 102Mhz, 1280x1024x75hz, standard scene

GTX-285 - 469 updates/sec, 157 Gigaflops
Vector Multi core - 262 updates/sec, 87 Gigaflops


great

#66
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
New Version of smalllux (OpenCL raytraycing) . Now highly optimized cpu + gpu open CL code.
Modern GPUs (GTX 260+, ATI 48xx) much faster than older ones because very komplex high performance OpenCL numercrunching. Good real world test.
Should run on ATI 48XX also .

DOWNLOAD : First Post !

That smalllux goal is to be an add on to the luxrender (Open Source) Raytraycing software (Mac OS X Version available!).

More Informations:
GPU smalllux
http://www.luxrender.net/

#67
mrheat

mrheat

    InsanelyMac Protégé

  • Members
  • PipPip
  • 50 posts
  • Location:Bavaria / Germany
great stuff mitch,

finally reaching 447W Total Power Consumption on:

q6600@3,2
asrock p43de
4gb ddr2-800
radeon hd 4870 1GB


So we can stress test system as hard as in win.

#68
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany

great stuff mitch,

finally reaching 447W Total Power Consumption on:

q6600@3,2
asrock p43de
4gb ddr2-800
radeon hd 4870 1GB


So we can stress test system as hard as in win.


have you used the new smalllux (cpu + gpu ) ?
What values do you get using dragon scene 640x480 4 cores + GPU ?
I get 2430 KRays/sec 780 KSamles /sec. For my 2 Core its better using 2 cores + gpu, 3-5% more KRays squeezed out the system. Because they used very high optimized code(uses each core very much) its better to use core count as native core count of the cpu. Apps which didt not put much load on each core may run faster with 4+ threads on an 2 core system. That smalllux not - it put always much much load on each core.
That values , KSampes/sec + KRays/Sec are the Benchvalues : the more the better /faster
As said, modern GPUs get much more work % of the GPU part. GTX 285 may give up to 3 times more GPU kSamles than 8800GTX or 8800GT. I dont know how ATI 4870 powers here , so 4870 / 4890 values are very interesting.
Also next 10.6.3 (soon) will perhaps give even better GPU OpenCL speed.

Attached Files



#69
mikoffski

mikoffski

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 4 posts
  • Gender:Male
  • Location:Brisbane, Australia
Core i7 920 2.66Ghz + Radeon HD4850

Dragon scene 640x480 8 Cores + GPU

5890 Krays/sec
2105 Ksamples/sec

4850 GPU only workgroup size 64

1778 Krays/sec
590 KSamples/sec

#70
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
THANKS for your values !
Even 4850 performes better compared to older Highend Mac 8800 GT.

Now new version 1.4B1 is out (DL 1. Post) which has enhanced Material support, even better lights support .
Could now also handle materials which are glas / gold ...
Because of changes in the scene files there are 2 new scenes : classroom and Little Buddha


screenshoot shows Little Budda my 8800GTX (2 Cores + GPU) 922K Samples / Sec (3718 KRays/sec)

Attached Files



#71
jeanlain

jeanlain

    InsanelyMac Protégé

  • Members
  • Pip
  • 31 posts
Hi, I tried the latest version :

With the little Buddha (640*480) scene 64 workgroupsize, I get 960 Ksamples/sec and 3900+ Krays/sec.
I've got a 2.66 Mac Pro Quad (2007) and a radeon HD 4870.
Seems quite a bit better than the 8800GTX results (GPU only) you posted as a reference. I guess this test shows the superiority of the radeon HD 4xxx series over older cards, and that recent AMD GPU can be good at openCL (it may just be a matter of drivers).

OTOH, this test pegs the GPU so hard that the dock and expose become barely usable. :D

Cheers, ;)

#72
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
Thanks for your results of GPU only !
Indeed 4870 is much faster than 8800GT, even faster than 8800GTX(has more shaders than 8800GT).
But real power get free with uptodate ATI 5870 or 2*GTX 285. Much faster than 4870, because more focus in their HW Dersign for compute on gpu. Also upcoming GTX 480/470 will work best on OpenCL, even compared in games they will not be much faster than 5870 .
But even on slower GPU you get at least 1-2 Cores "CPU" more with that task.
I updated the Version to V2 - Classroom Szene now has much better locking chrome reflections (chairs) - should not affect the speed.

#73
jeanlain

jeanlain

    InsanelyMac Protégé

  • Members
  • Pip
  • 31 posts
Thanks for the info.

With GPU + 4 CPUs I get 1480 Ksamples/sec and 5900 Krays/sec on the Buddha scene (results above were for GPU only).

Cheers.

#74
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
You see that highly optimized smallux does hybrid (CPU Cores + GPUs) much better than Apples Galaxies Demo.
For the luxrender or blender usage (thats the goal of the project) its an afterburner - at least withan fast ATI 58xx or GTX 295 (=2*275) on Quad+ CPUs. For example: MacPro 2009 needs an much faster GPU to boost 30%+ with OpenCL - ATI 4870 is for that big CPU speed in minimal GPU for OpenCL.
For significat boost with C2D CPUs an ATI 4870 or GTX 285 is enough.

Have you tried the classroom scene with GPU only / 4 Cores + GPU ? Would be finde to see ATI 4870 results :)

I get 8800GTX gpu only: 123 KSamles/Sec 677 KRays/sec in classroom scene gpu only. With 2 Cores + 8800GTX: 330 kSamles/Sec 2222 KRays/Sec.
You ATI will perform also better in that scene i think.

#75
jeanlain

jeanlain

    InsanelyMac Protégé

  • Members
  • Pip
  • 31 posts

You see that highly optimized smallux does hybrid (CPU Cores + GPUs) much better than Apples Galaxies Demo.
For the luxrender or blender usage (thats the goal of the project) its an afterburner - at least withan fast ATI 58xx or GTX 295 (=2*275) on Quad+ CPUs. For example: MacPro 2009 needs an much faster GPU to boost 30%+ with OpenCL - ATI 4870 is for that big CPU speed in minimal GPU for OpenCL.
For significat boost with C2D CPUs an ATI 4870 or GTX 285 is enough.

Have you tried the classroom scene with GPU only / 4 Cores + GPU ? Would be finde to see ATI 4870 results :wacko:

I get 8800GTX gpu only: 123 KSamles/Sec 677 KRays/sec in classroom scene gpu only. With 2 Cores + 8800GTX: 330 kSamles/Sec 2222 KRays/Sec.
You ATI will perform also better in that scene i think.

Yes, I get 550 KSamples/s and 2650 KRays/s GPU only (256 workgroupsize), after about 1 min of test. That's 4 times better than the 8800GTX. :wacko:
Note that I have a flashed Radeon with 1GB Vram, and probably standard GPU and VRAM clockspeeds (i.e., not crippled by Apple).

#76
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
GREAT classroom KSamples /Sec !
I dont think your flashed GPU uses reduced clockspeed. Mostly if something wrong, it "hangs" at full clockspeed and trottle didt work. Not otherweise :(

HINT: dont read the wrong values .

Example Screenshoot : running raytracing (red values) isnt already rendered (samles = 0!). Wait as long as you see at least 32 Samples and then you can see the avg KSamples /Sec (white bottom values) also ! That means time can be more than 1 Minute (on GPU only) for waiting of the result values (complete scene rendered).

Attached Files



#77
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
New Version uploaded (link #1 post) !
V1.4.3
luxball scene added (now default), GUI fixes, updated slg to 1.4.3

You samles/sec and rays/sec are not comparable to Versions before !

#78
jeanlain

jeanlain

    InsanelyMac Protégé

  • Members
  • Pip
  • 31 posts
Hey, seems like 10.6.3 has improved speed a bit.
On the same settings as above, I get 1100 Ksamples/sec and 4050 Krays/sec on the buddha scene (GPU alone). Comparable improvements on the classroom scene. :)
Need to test the newer version.

#79
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,885 posts
  • Gender:Male
  • Location:Stuttgart / Germany
The newest always you will find since now more easy at macupdate.com
smalluxGPU 1.4.5++
Soon there will be an 1.5Beta1.

#80
jeanlain

jeanlain

    InsanelyMac Protégé

  • Members
  • Pip
  • 31 posts
1400 Ksamples/sec and 2900 Krays/s (luxball), GPU only in v. 1.4.5.
With the CPUs, the radeon is only at 50% load (was always near 100% in V. 1.41, which made OS X interface really sluggish).
Why is the radeon 4870 so much better than the 8800GTX at this test, but slower at the galaxy one?





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

© 2014 InsanelyMac  |   News  |   Forum  |   Downloads  |   OSx86 Wiki  |   Mac Netbook  |   PHP hosting by CatN  |   Designed by Ed Gain  |   Logo by irfan  |   Privacy Policy