Jump to content

All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...


mitch_de
 Share

125 posts in this topic

Recommended Posts

Hey, everybody. I wanted to post this link in here, it's to an iTunes Visualizer that makes an OpenCL fluid simulation dance to your music:

 

http://www.mutantquartz.com/?p=40

 

 

(No, I did not make it, that is not my blog. I just found it when the author posted it on the MacRumors forums).

 

Summary of the important-for-benchmarking controls for those who don't feel like referring to the link constantly:

 

F - Toggle framerate counter

X - Toggle hi-res mode (requires reset to apply)

R - Reset simulation

H - Toggle hight-mapping mode

 

 

Oh, and it has a memory leak. Not that big of a deal, you just need to restart iTunes to clear it, but be forewarned. Also, it didn't used to work on ATI but that may have changed in 10.6.2, I don't have an ATI card to test it with.

 

Some numbers from my MBP's 256MB 9600M GT on 10.6.2 (all of these are with music sensitivity, speed, and color set at the defaults):

 

Normal mode: ~18fps

Hight-map: ~18fps

High-res: Slideshow (didn't even try high-map)

Link to comment
Share on other sites

  • 3 weeks later...
On ati radeon HD 4670: ~ 9 fps in 2D and heightmap. high-res is awful. CPU use is oddly high, ~177%.

 

Macbook Pro 2.4GHZ 2GB RAM, 9600M GT 256MB

 

Earfluid 0.1

About 25 fps in 2D and heightmap. High-res kills it to less than 10 fps. CPU use around just less than 50%

 

Galaxies 32K V2

About 50 Gigaflops

 

Macbook (Aluminium) 2GHZ 4GB RAM, 9400M

 

Earfluid 0.1

About 11 fps in 2D and heightmap. High-res kills it to less than 5 fps. CPU use around just less than 40%

Not as bad as I thought. This 9400M ain't that shabby compared to the 9600M GT. Definitely better than anything Intel for years.

 

.................

.................

 

Hey, everybody. I wanted to post this link in here, it's to an iTunes Visualizer that makes an OpenCL fluid simulation dance to your music:

 

http://www.mutantquartz.com/?p=40

 

 

(No, I did not make it, that is not my blog. I just found it when the author posted it on the MacRumors forums).

 

Summary of the important-for-benchmarking controls for those who don't feel like referring to the link constantly:

 

F - Toggle framerate counter

X - Toggle hi-res mode (requires reset to apply)

R - Reset simulation

H - Toggle hight-mapping mode

 

 

Oh, and it has a memory leak. Not that big of a deal, you just need to restart iTunes to clear it, but be forewarned. Also, it didn't used to work on ATI but that may have changed in 10.6.2, I don't have an ATI card to test it with.

 

Some numbers from my MBP's 256MB 9600M GT on 10.6.2 (all of these are with music sensitivity, speed, and color set at the defaults):

 

Normal mode: ~18fps

Hight-map: ~18fps

High-res: Slideshow (didn't even try high-map)

 

Nice find ;) ...Not sure why your fps is lower than mine though.

Link to comment
Share on other sites

  • 1 month later...

YEP !

Galaxies 32K , NV 8800GTX : 197 Gigaflops.

 

3"]NEW OpenCL Raytraycing Benchmark (updated 1. Posting)[/size]

 

smallluxGPU .

Does raytraycing by GPU, GPU+CPU or CPU only

Very complex (real life) computing, so less advantage for weak GPU than running more low level OpenCL Demos.

Does much better hybrid (CPU+GPU) than Galaxies = even an NV9400M make sense and supports the CPU by 15-20% speedgain!

Uses ALL openCL GPUs (up to 4) which it find.

Also works with ATI 48XX GPUs .

 

MORE in the 1. posting!

Link to comment
Share on other sites

Hi mitch_de

 

Well done with keeping up to date with the latest benchmarks :(

 

Specs:

Video: 8800GT 512MB OC

CPU: E7300 @ 2.66Ghz

 

Results for the new OpenCL Raytraycing benchmark:

GPU only: ./smallluxGPU 0 0 1 64 = Avg. rays/sec 2755K

CPU + GPU: ./smallluxGPU 2 0 1 64 = Avg. rays/sec 3160K

CPU only: ./smallluxGPU 2 0 0 64 = Avg. rays/sec 540K

Note: The CPU only test started at 707K but settled to 540K after 36seconds.

Link to comment
Share on other sites

Ist normal that the ray/sec needs some time to stay stable (less changes in rays/sek).

Users may change the workgroupsize from 64 to 128, 256 or 32 . Workgroupsize is an OpenCL parameter which depends on the GPU. Larger workgroupsize may speedup the GPU only part on faster / modern GPUs .

But mostly an bigger workgroupsize will NOT change the GPU only speed significant - at least not on GPUs like mine (8800 GT) - but an GTX 260/ GTX 275/ GTX 285 may perform better with 128 or 256.

To big workgroupsize may slow down or even crash the OpenCL Programm.

Example:

./smallluxGPU 0 0 1 64

./smallluxGPU 0 0 1 256 (workgroupsize 64 > 256)

 

 

Also , if you want to "see whats done" you can switch into the interactive mode :

./smallluxGPU 0 0 0 1 0 640 480 scenes/luxball.scn (GPUs only, workgroupsize=0=default)

./smallluxGPU 0 2 0 1 0 640 480 scenes/luxball.scn (CPU 2 Threads + GPUs , workgroupsize=0=default)

Interactive mode mosly a bit slower rays/sek than the benchmark mode, because the OpenCL App must do all the screen output to !

Link to comment
Share on other sites

Hello

 

snow lepoard 10.6.2 64 bits nvidia 250 gts

I have this message when i try smallluxGPU :

 

<low latency mode enabled (0 or 1)> <native thread count> <use CPU device (0 or 1)> <use GPU device (0 or 1)> <GPU workgroup size (0=default value or anything > 0)> <window width> <window height> <halt time in secs> <scene file>
Reading scene: scenes/simple.scn
terminate called after throwing an instance of 'std::ios_base::failure'
 what():  basic_ios::clear
Abort trap

Link to comment
Share on other sites

PLEASE READ THE "HOW TO RUN" file within the zip.

Its an command line / Terminal app - doesnt run by double click it !!!

 

you must start the terminal, change dir to the main folder of the command line app

you must add command line option to the app !!!

 

run benchmark GPU only (CTRL + C to abort)

./smallluxGPU 0 0 1 64

YOu may use also 128 or 256 tinsted of the 64 above to get more rays/sek - but that only make sense on very modern GPUS (GTX 275+)

 

run interactive GPU only

./smallluxGPU 0 0 0 1 0 640 480 scenes/luxball.scn

Link to comment
Share on other sites

Hi Mitch_de,

 

I am tryin to start mandelbrot and nbody tests (CUDA_Nvidia_GPU_Benches) but I keep getting error :

Rushkos-iMac:~ rushko$ cd /Users/rushko/Downloads/Benchmarks/CUDA_Nvidia_GPU_Benches/Mandelbrot_ 
Rushkos-iMac:Mandelbrot_ rushko$ /Users/rushko/Downloads/Benchmarks/CUDA_Nvidia_GPU_Benches/Mandelbrot_/Mandelbrot 
dyld: Library not loaded: @rpath/libcudart.dylib
 Referenced from: /Users/rushko/Downloads/Benchmarks/CUDA_Nvidia_GPU_Benches/Mandelbrot_/Mandelbrot
 Reason: image not found
Trace/BPT trap
Rushkos-iMac:Mandelbrot_ rushko$ 

 

What I'm doin wrong? Thanks!

 

BTW new bench results for Gigabyte 250GTS OC:

 

galaxies 32k: 299Gflops.

 

Grass: 64-69fps

 

OpenCL Transpose Bandwidth: 44.811326 GB/sec

 

OpenCL QJulia and qjulia1024 are showing variable results

 

OpenCL aobench: CPU - 1.79fps max, GPU 12fps max

 

displacement won't work

 

smallluxGPU:

 

GPU: 3000-3100 rays/sec

CPU+GPU 3900+ rays/sec

Link to comment
Share on other sites

Thanks sharing your results !

Big OpenCL advantage will have GTX275+ and new FERMI , updated 5XXXX ATI in near future.

GTX 285 will have at least tripple performance of 8800 GT / near twice of GT250.

FERMI has really much more focus on gpu computing - much more than ours and also ATI 58xx.

But even an 9400M may boost such things like rendering with an 15-20% boost for an C2D cpu - saves one MHz cpu version step .

 

 

 

The mandelbort is an CUDA app (does same as OpenCL but NV special). You must have installed the CUDA drivers.

But OpenCL will be more interesting in near future. Main PRO for CUDA is, that it can be used also with 10.5, whereas OpenCL is 10.6+ only.

 

There is an new smalllux Version out (with an dragon scene), i updated first posting. Results with same szene will be the same at least for 8800GT /250GT.

Link to comment
Share on other sites

Hi,

 

That's good news Mitch, I just wonder when we can expect OpenCL accelerated apps and since the modern GPUs have so much power, how it will affect the CPUs...

 

Regarding mandelbrot, I've installed CUDA drivers 3.0.1 beta1 before I've tried to start it, so it's not causing the problem I think. Is there a way to test that CUDA is installed and working?

Link to comment
Share on other sites

Hi,

 

That's good news Mitch, I just wonder when we can expect OpenCL accelerated apps and since the modern GPUs have so much power, how it will affect the CPUs...

 

Regarding mandelbrot, I've installed CUDA drivers 3.0.1 beta1 before I've tried to start it, so it's not causing the problem I think. Is there a way to test that CUDA is installed and working?

 

Remember that CUDA, even newest DOESNT support 64 Bit Kernel Mode of Snow Leo !

 

4 things may be the key how fast / why slower OpenCL get used in Mac OS X Apps :

1. HW: The count of OpenCL-GPU ready Macs is, beside 9400M(can do OpenCL but miniboost :thumbsup_anim: ) much less than on PC site

2. SW: Lots of Mac Users are stuck with 10.5 (PPC+ some Intel also) - no OpenCL

3. HW: Only 8800GT and GTX285 Users of real Macs have real benefit. ATI 4850/70 OpenCL driver needs much bugfixes or OpenCL Code must be much ATI friendly=much work

4: SW: OpenCL coding background (= code must be parallelised) new for most OS X devs - on Win / Linus site devs learned already better by using CUDA

So maybe Apple itself will open the OpenCL boosted Apps like iLife (iDVD, iMovie) and does beside CoreImage some tasks with OpenCL.

Other OS X companies will for sure learn+implement OpenCL in beta tests, but i think that will use time at least until summer 2010 (10.6.4+).

The most interest for that GPU programming comes from universitys, which have much gpu coding knowledge + less money (for "CRAYs", big clusters)

Very interesting is FASTAII HighendPC from an univertity of belgium used as medizine task number cruncher .

They build an special GPU compute server with 6 GTX 295(=2*275 each) !!! = 12 GPUs for CUDA/OpenCL for less than 8000 US$ - much faster than an >20 times expensiver linux cluster. Also much, much less power consumtion + cooling needs that such linux clusters - FASTRAII uses up to 1200Watt - much for us - peanuts against an bigger+much slower+much ++US$ linux cluster !

Its CUDA but thats really near same coding as for OpenCL.

 

FASTRAII !!!!

 

fastra_small.jpg

graph_reconstruction.png

Link to comment
Share on other sites

  • 2 weeks later...

New Version of smalllux (OpenCL raytraycing) . Now highly optimized cpu + gpu open CL code.

Modern GPUs (GTX 260+, ATI 48xx) much faster than older ones because very komplex high performance OpenCL numercrunching. Good real world test.

Should run on ATI 48XX also .

 

DOWNLOAD : First Post !

 

That smalllux goal is to be an add on to the luxrender (Open Source) Raytraycing software (Mac OS X Version available!).

 

More Informations:

GPU smalllux

http://www.luxrender.net/

Link to comment
Share on other sites

great stuff mitch,

 

finally reaching 447W Total Power Consumption on:

 

q6600@3,2

asrock p43de

4gb ddr2-800

radeon hd 4870 1GB

 

 

So we can stress test system as hard as in win.

 

have you used the new smalllux (cpu + gpu ) ?

What values do you get using dragon scene 640x480 4 cores + GPU ?

I get 2430 KRays/sec 780 KSamles /sec. For my 2 Core its better using 2 cores + gpu, 3-5% more KRays squeezed out the system. Because they used very high optimized code(uses each core very much) its better to use core count as native core count of the cpu. Apps which didt not put much load on each core may run faster with 4+ threads on an 2 core system. That smalllux not - it put always much much load on each core.

That values , KSampes/sec + KRays/Sec are the Benchvalues : the more the better /faster

As said, modern GPUs get much more work % of the GPU part. GTX 285 may give up to 3 times more GPU kSamles than 8800GTX or 8800GT. I dont know how ATI 4870 powers here , so 4870 / 4890 values are very interesting.

Also next 10.6.3 (soon) will perhaps give even better GPU OpenCL speed.

Bildschirmfoto_2010_02_17_um_13.26.36.jpg

Link to comment
Share on other sites

THANKS for your values !

Even 4850 performes better compared to older Highend Mac 8800 GT.

 

Now new version 1.4B1 is out (DL 1. Post) which has enhanced Material support, even better lights support .

Could now also handle materials which are glas / gold ...

Because of changes in the scene files there are 2 new scenes : classroom and Little Buddha

 

 

screenshoot shows Little Budda my 8800GTX (2 Cores + GPU) 922K Samples / Sec (3718 KRays/sec)

happybuddha.jpg

Link to comment
Share on other sites

Hi, I tried the latest version :

 

With the little Buddha (640*480) scene 64 workgroupsize, I get 960 Ksamples/sec and 3900+ Krays/sec.

I've got a 2.66 Mac Pro Quad (2007) and a radeon HD 4870.

Seems quite a bit better than the 8800GTX results (GPU only) you posted as a reference. I guess this test shows the superiority of the radeon HD 4xxx series over older cards, and that recent AMD GPU can be good at openCL (it may just be a matter of drivers).

 

OTOH, this test pegs the GPU so hard that the dock and expose become barely usable. :D

 

Cheers, ;)

Link to comment
Share on other sites

Thanks for your results of GPU only !

Indeed 4870 is much faster than 8800GT, even faster than 8800GTX(has more shaders than 8800GT).

But real power get free with uptodate ATI 5870 or 2*GTX 285. Much faster than 4870, because more focus in their HW Dersign for compute on gpu. Also upcoming GTX 480/470 will work best on OpenCL, even compared in games they will not be much faster than 5870 .

But even on slower GPU you get at least 1-2 Cores "CPU" more with that task.

I updated the Version to V2 - Classroom Szene now has much better locking chrome reflections (chairs) - should not affect the speed.

Link to comment
Share on other sites

You see that highly optimized smallux does hybrid (CPU Cores + GPUs) much better than Apples Galaxies Demo.

For the luxrender or blender usage (thats the goal of the project) its an afterburner - at least withan fast ATI 58xx or GTX 295 (=2*275) on Quad+ CPUs. For example: MacPro 2009 needs an much faster GPU to boost 30%+ with OpenCL - ATI 4870 is for that big CPU speed in minimal GPU for OpenCL.

For significat boost with C2D CPUs an ATI 4870 or GTX 285 is enough.

 

Have you tried the classroom scene with GPU only / 4 Cores + GPU ? Would be finde to see ATI 4870 results :)

 

I get 8800GTX gpu only: 123 KSamles/Sec 677 KRays/sec in classroom scene gpu only. With 2 Cores + 8800GTX: 330 kSamles/Sec 2222 KRays/Sec.

You ATI will perform also better in that scene i think.

Link to comment
Share on other sites

You see that highly optimized smallux does hybrid (CPU Cores + GPUs) much better than Apples Galaxies Demo.

For the luxrender or blender usage (thats the goal of the project) its an afterburner - at least withan fast ATI 58xx or GTX 295 (=2*275) on Quad+ CPUs. For example: MacPro 2009 needs an much faster GPU to boost 30%+ with OpenCL - ATI 4870 is for that big CPU speed in minimal GPU for OpenCL.

For significat boost with C2D CPUs an ATI 4870 or GTX 285 is enough.

 

Have you tried the classroom scene with GPU only / 4 Cores + GPU ? Would be finde to see ATI 4870 results :wacko:

 

I get 8800GTX gpu only: 123 KSamles/Sec 677 KRays/sec in classroom scene gpu only. With 2 Cores + 8800GTX: 330 kSamles/Sec 2222 KRays/Sec.

You ATI will perform also better in that scene i think.

Yes, I get 550 KSamples/s and 2650 KRays/s GPU only (256 workgroupsize), after about 1 min of test. That's 4 times better than the 8800GTX. :wacko:

Note that I have a flashed Radeon with 1GB Vram, and probably standard GPU and VRAM clockspeeds (i.e., not crippled by Apple).

Link to comment
Share on other sites

 Share

×
×
  • Create New...