NEW OpenCL Raytraycing Benchmark (updated 1. Posting)
Does raytraycing by GPU, GPU+CPU or CPU only
Very complex (real life) computing, so less advantage for weak GPU than running more low level OpenCL Demos.
Does much better hybrid (CPU+GPU) than Galaxies.
Uses ALL openCL GPUs (up to 4) parallel which it find.
Also works with ATI 48XX GPUs .
Update to V170 (always same link )[/b]
Major update with console tab (you can see informations the gui also shows but now even more + errors)
Happy benching (times type 0 no changes, type 1 maybe little faster)
33632_scr.jpg 253.91KB 292 downloads
Older stuff / mostly not much real world like smalllux !
Galaxies32K V2 + Galaxies 8K V2 + Grass + Displacement
+AO (raytraycing CPU/GPU) + Transpose Bandwith [/size]
Snow Leopard + Intel Macs ONLY !
ATI OpenCL GPUs (4850&4870) not really working! - i am in contact with ATI DEVs -problems with OpemCL Drivers/Framework - must+will be fixed with 10.6.1 or an ATI Driver Update
HOT NEWS - always updated here
- New Galaxies OpenCL Bench V2 build
- Apple updated / fixed some OpenCL API usage (maybe help ATI)
- little speed up (10% on my GT 9600)
Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.
If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.
DL Links on 1 post
- OS X 10.6.1 updated ATI/Intel+Nvidia OpenGL drivers, but the OpenCL Framework stays same. So ATI will fail (i am sure) also with 10.6.1 running Apple OpenCL Demos (here listed).
Apple does an rewrite of the GALAXIES OpenCL Demo/Bench - Nvidia GTX285 will rise from 280 GigaFlops to around 400 GigaFlops
The CPU GigaFlops stays same around 28 GigaFlops C2D /100 GigaFlops MacPro 2009 .
This new Version will compiled and shared here like the last one. Slower OpenCL GPUs, like mine 9600 GT should not expect such an big GigaFlop boost with that new Galaxies (N-Body) Apple Demo.
OPENCL - Good to know :
- OpenCL is an API for universal GPU(CPU) computing
- main difference to CUDA / ATI STEAM is: both only working with their "own" gpu.
an CUDA (NV) app like badaboom(h264 on GPU) cant work on an ATI gpu and vice versa
- OpenCL is universal for different gpu vendors means:
- Xcode / GCC compiles an code which includes the source (in C as an string) for the gpu programm
that c source is , different to CUDA/ATI STEAM , is compiled later by OpenCL Framework at runtime !
So same App can run on complete different gpus and also , without/less codechange om CPU if no
OpemCL gpu (newer ones) is found
The source (example below) for the gpu programm will be really compiled at runtime, not only interpreted.
So little differences between run of my bench may happen because of that compile on the run
form An Information from ATI OS X OpenCL divison dev team:
Thank you for the quick response and I hope you extend the benchmark application since it’s a really good idea. Regarding the sample applications posted on the developer.apple.com website (eg. Galaxies, Qjulia, etc), we are aware that some fail (or even crash) on AMD hardware and working to track down all these issues. We suspect that most of these issues will be resolved for the next graphics driver update in Snow Leopard.
BTW, I ran the demos on a iMac with a Radeon 4850 and I get the following results:
G A L A X I E S - an CPU vs GPU GigaFlop Bench
Due to an hint from ATI, i increased the count of thing to compute from 4K to 16K
New 16K version is available , shows 16K at the legend.
Apple reduces that count for GPUs without discrete VRAM = 9400M / 8600M, so it is set to 4K, even legend shows 16K.
You cant compare 9400M / 8600M 1:1 with the other, run 16K count
Using of GALAXIES:
key s = switch compute Modes
>CPU>Single/Multi, CPU-Vector/SSE Single/Multi>GPU> GPU+CPU> (bold=start Mode)
key SPACE = Pause/go on
key 6 = Reset Szene
key Q = QUIT
DOWNLOADs each 6 MB:
mitch (C2D 3GHZ, NV 9600 GT , 1280x1024)
V1 /V2 24 Gigaflops : CPU ( SIM: Vector Multi-Core CPU. Mode)
112 Gigaflops[/b] : NV 9600 GT 16K
V2 142 Gigaflops : NV 9600 GT 32K[/b]
Users results (new 32K V2 Version):
CPU 18 G Nvidia 9600gt gpu 149 GigaFlops 32k v2.0 at 1680x1050
Gigabyte Ga-eg45m-ud2r, Intel e6750,
Mac Pro Nehalem 8 core 2.93GHz:http://www.barefeats.com/index.html
All tests 2500x1600!
Nvidia GTX 285 ...... soon!!
Mac Pro Nehalem 8 core CPU = .....
New OpenCL Transpose Bandwith - measures Bandwith of Matrix-Transpose
DL Link at the end of posting (very small, run like all other terminal OpenCL Bench apps)
Nvidia 9600GT: around 39 Gigabyte/Sec
Mac Pro (1,1) 2.66Ghz 4GB RAM, 4870 1GB sapphire
Performing Matrix Transpose [256 x 4096]...
Bandwidth Achieved = 3.160816 GB/sec
MacPro 2009, NVidia GTX 285 Mac
Bandwidth Achieved = around 80 GB/sec
New OpenCL AO Bench (512*512 insted of 256*256 barefeat = barefeat results / 2)
DL Link at the end (very small, SSE4 optimized, 512*512 Window )
NV 9600 GT : 8 FPS (512*512)
C2D 3 GHZ : 0.8 FPS
So ATI users may try the new compiled
Procedural Geometric Displacement FPS Bench
Download for ATI + Nvidia USERS:
OpenCL_Displacement_Bench.zip - with step by step HOW TO RUN readme -
An new compiled Displacement (the app only) which was build with GCC 4.2 very less optimzed compiler settings seems to run on ATI 4870 more stable / reliable.
If you have such problems with displacement, I attatched the small dl at the first post as displacement_ATi for overwrite+usage with the whole (normal) 7 MB dl.
QJulia1024 Results (the qJulia with 1024x1024 window size)
9600 GT , around 13 FPS static - please let the bench first (wait a few seconds) show static FPS before you switch to animate (SPACE)
8-16 FPS when animating
Rob GTX 285 Mac
1024x1024 = 44 fps
eVGA GTX-285 1024MB 46.70fps
8800 GT 22.46 FPS
qJulia Results (800x800 window)
9600GT : around 29 FPS static, 16 - 60 FPS when animating (key SPACE)
eVGA GTX-285 1024MB 98.79 fps
Rob GTX 285 Mac
800x800 = 93 fps
2.4GHz C2D, 4GB RAM, 8600M GT (256MB)
static shows 10 - 11 FPS animated shows 9 - 11 FPS
MacBook Pro 13", GPU GeForce 9400M
running at 6,25 fps (6-6,50)
OpenCL Displacement FPS results (ATI should work !)
9600 GT 80 FPS first (white background+shadow) / 102 FPS second (with texture in backround)
ATI 4870 both around 90 FPS - but only 1/3 of start the bench was successfull - so also that bench didnt work 100% well - wait for OS X 10.6.1
Geforce GTX285 Mac : around 220 FPS, second shader test
Radeon 4870 1GB (sapphire)
Mac Pro (1,1) 2.66Ghz quad core, 4GB RAM
both szenes near 90 FPS
Most 4870 are near together between 84 and 90 FPS - but some test fail and some get bad result window graphics
GRASS simulates an scene grass sticks moving in the wind
Grass Results 4 Meg triangles + 170.000 Sticks to compute - big szene! ( 1024x1024 window size)
9600 GT , around 53 FPS
2.4GHz C2D, 4GB RAM, 8600M GT (256MB) 27 - 29 FPS
8800 GT 56.97 fps
i920 Overclocked to 4.4Ghz, 1760Mhz DDR3, PCIE-100Mhz,
eVGA GTX-285 1024MB 95.50 fps
Rob ( barefeats! Test mule is Mac Pro Nehalem 2.93 Octo)
GeForce GTX 285 Mac = 88 - 91 fps
Quadro FX 4800 = 77 fps steady
GeForce 8800 GT = 54 fps steady
GeForce GT 120 = 35 fps steady
DL for qJulia + Grass (has GUI) (at the end)
Read the readme - you will ger an file not found error (loads the qjulia.cl OpemCL source, if you didnt changed terminal directory to the app folder before running the command line app.
For all GLUT (Terminal Apps, Transpose+qJluia+AO) check the app preferences of SYNC is OFF (screenshoot OpenCL AO preferences)