All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...

wesux · September 17, 2009

I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL

My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.

The 4870 isn't well supported for OpenCL, I've got a Mac Pro '06 and it performs almost the same as my CPU and most of the times it crashes. As Mitch had mentioned, we'd have to wait on the Apple ATI dev team to fix these problems.

Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.

sch8mid · September 23, 2009

Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.

right

isnt FC still 32 bit , uses only 2GB max of memory ???

come on Randy Ubillos and the FCteam ...

but some interesting news :

AMD /ATI seems to be really dedicated to Open CL

As of today AMD released a press info that the company is awaiting Open Cl certification from the

Khronos Working Group .

8th of August ATI released a beta SDK for x86-basied CPUs (certified by Khronos September 3 th)

and ATI Stream SDK v2.0 will be ready this year (project book + = source forge)

as of today (09/23)we will see the new DirectX - 11 cards (RV 870) (support for Directcompute)

From a technical point of view these new cards with 40nm seem to be lightyears in front of their Nvidia counterparts.

We all know as well , that from a historical point of view, this was always the case in the last years

but we obviously very often had to deal with a very weak ATI driver support too.

Lets hope that Apple is aware of this new situation soon and give us some alternatives to the green camp.

As a HTPC user , only red cards will find their way into my rig.

Best

as

mitch_de · September 29, 2009

New Galaxies OpenCL Bench V2:

- Apple updated / fixed some OpenCL API usage (maybe help ATI)

- little speed up (10% on my GT 9600)

Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.

If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.

DL Links on 1 post

tinush · September 29, 2009

New Galaxies OpenCL Bench V2:
- Apple updated / fixed some OpenCL API usage (maybe help ATI)

- little speed up (10% on my GT 9600)

Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.

If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.

DL Links on 1 post

Very nice Mitch_de

Results 32k v2.0 at 1680x1050

Sim Vector

S-core cpu 10 G

M-core cpu 18 G

9600gt gpu 149 G

Hybrid M&G 34 G

Thnx

T.

mitch_de · September 29, 2009

Thanks !

Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

lamer0 · September 30, 2009

Mirror for galaxies, I hate rapidshare with a passion.

http://victori.uploadbooth.com/osx86/galaxies-32k-v2.zip

32k version.

q8200(quad core)/8600GT - 32gflogs

8600GT - 48gflops --- okay? how is the hybrid approach slower?

mitch_de · September 30, 2009

q8200(quad core)/8600GT - 32gflogs
8600GT - 48gflops --- okay? how is the hybrid approach slower?

Hybrid is slower as GPU only (and sometimes also CPU only) because of much more syncing + data transfers time needed between CPU + GPU as with CPU alone or GPU alone.

OpenCL bootleneck is the very slow PCIe Datatransfer, compared to CPU - Main Memory datatransferspeed. 2-5 GB/sec PCIe vs upto 50 GB/Sec CPU-L2/L3-Memory. GPU itself also has very fast memory access : up to 160 GB/sec. But getting thr data to gpu and reading it back from is the problem (on fast GPUs

So PCIe bandwidth limits OpenCL (and CUDA) overallperformance benefit .

Some tests shows that transferspeed to and from GPU may use 80% of overalltime ! So GPU computes very fast but the time to get data to and from gpu can be the bottleneck.

For example an MacPro 2009 may get higher GigaFlops CPU only than with an GT120 GPU. Reason : GPU to slow + PCIe Transfertimes

Same GPU on an lowend C2D System is much faster to the C2D CPU only.

The PCIe transferspeed also is on "problem" for CoreImage.

This is an reason why in the past, as CI was first used on AGP Macs, CI had bad peformance and got a bit "lost".

AGP bandwith is ver, very bad in the direction from GPU to CPU - less than 250 MB/s. Other direction CPU>GPU (normal gaming way) up to 1 GB/s.

So they made PCIe which was much better but i think, because of upcoming very fast 5870 + GT 300 in the next 2 years they need to update PCIe again to faster speed .

tinush · September 30, 2009

Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

not yet, will test this later

just finished a perfect retail 100% working snow install (incl auto-sleep & keyboard/mouse wake) :blink:

T.

shoarthing · October 6, 2009

Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

. . Rapidshare link for the 8K_V2 doesn't work.

Edit: Sorry - link works fine - my ISP has just now started to block Rapidshare

osssua · October 6, 2009

. . Rapidshare link for the 8K_V2 doesn't work.

(tried 2 browsers, & jdownloader: my ISP doesn't block Rapidshare)

Am trying to bench an Atom330 ION MCP79/7A motherboard so only this version likely to run at all . . would appreciate if some kind soul would mirror this version to a working link.

TIA

Galaxies now work in ATI 4870 with 10.6.2 seed

http://netkas.org/?p=240

shoarthing · October 7, 2009

Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]

CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.

. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini

Schenkenberg · October 7, 2009

Awesome post!

Galaxies 32k running on 1900x1200 in SL 10.6.1

i7 920 @ 3.4GHz

GTX 260

Vector Single Core CPU: 14

Vector Multi Core: 57 (that's what I call proper multi-core scaling!)

GPU: 275! Came up from 180 with the 8k benchmark!

CPU+GPU: 95

I really love the 260's performance. For that pricepoint (got for €140) it really shines.

n00b32 · October 9, 2009

Hi,

I added a table for better comparison of the OpenCL benchmarks:

http://wiki.osx86project.org/wiki/index.php/OpenCL

What would be the best benchmark for evaluation of OpenCL performance? Galaxies?

@mitch_de:

could you provide in this standard benchmark a build number visible while benchmarking (better comparison)?

Thanks

Jason

nvidia2008 · October 10, 2009

Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]

CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.

. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini

MacBook 2.0GHZ Aluminium 4GB DDR3 RAM

Galaxies 8K V2

GPU 9400M mode 20 Gflops/ 15 Updates

Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)

Mac OS X 10.6.1

shoarthing · October 10, 2009

MacBook 2.0GHZ Aluminium 4GB DDR3 RAM

Galaxies 8K V2

GPU 9400M mode 20 Gflops/ 15 Updates

Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)

Mac OS X 10.6.1

. . thank you *very* much for posting this: I knew the GPU & Shader clocks were supposed to be the same on the MCP7x variants; but nice to have it confirmed.

Surprised the Macbook's DDR3 didn't make a solid difference tho' . . . . .

cwestpha · October 12, 2009

Hmm on my 2008 Mac Pro 2.8 Ghz 8-core with 285 GTX running the 32K galaxies V2 under 10.6.1:

CPU: 11

multi core: 88

GPU: 329

hybrid: 123

n00b32 · October 13, 2009

Hi,

how come CUDA on Mac OS X? Where did you get these drivers from?

All my results are with cuda drivers.

shoarthing · October 13, 2009

Hi,

how come CUDA on Mac OS X? Where did you get these drivers from?

. . . Nvidia downloads

NB: v2.3x [the current one] 32-bit only AFAIK . . to get an idea of where this is at see the relevant section of the NV forums

n00b32 · October 13, 2009

thnx, didn't know that

right now only the relevant app's aren't there, yet ;-)

MarceloDub · October 30, 2009

good

byronrock · November 1, 2009

Is there a way to run the other benchmark in a hackintosh?

I just can run Galaxy, (by the way i get 30G with my Athlon x4 720)

I want to run displacement but says "bad cpu type in executable" logout

I have a 9400gt

am i doing something wrong??

@ROBASEFR · November 11, 2009

Hi

I've got success with ATI HD4850 Gainward GS 512 under OSX 10.6.2 in my HAckintosh

all test with 1920x1080x32x60hz LCD HD monitor

Displacement:43 fps.

Galaxies 32K V2.0 and 8k V2.0 did work !

when i toggle with the S key i get:

2;13;47;69;52 Gflops

OpenCL Bench V 0.20 by mitch

....CL_DEVICE_NAME: Intel® Core i7 CPU 920 @ 2.67GHz .....

CL_DEVICE_VENDOR: Intel

CL_DEVICE_MAX_CLOCK_FREQUENCY: 3096 MHz

CL_DEVICE_MAX_COMPUTE_UNITS: 8

Now computing - please be patient....

time used: 9.933120

Number of elements computed: 2097152

....CL_DEVICE_NAME: Radeon HD 4870 .....

CL_DEVICE_VENDOR: AMD

CL_DEVICE_MAX_CLOCK_FREQUENCY: 750 MHz

CL_DEVICE_MAX_COMPUTE_UNITS: 10

Now computing - please be patient....

time used: 16.656227

Number of elements computed: 2097152

Now checking if results are valid - please be patient....

Validate results test passed - GPU=CPU

logout

And: Transpose bandwith test

Tests/Open\ CL/OpenCL\ Tranpose\ Bandwidhttest/transpose

Performing Matrix Transpose [256 x 4096]...

Bandwidth Achieved = 2.755923 GB/sec

Results Validated!

:unsure:

dudelolchris · November 13, 2009

All the OpenCL demos crash on my brand new Late 2009 iMac with the ATi 4670 graphics.

This makes me sad.

computergek80 · November 16, 2009

They crash for me too, 27" iMac Radeon HD 4670. ANyone know whats up?

mitch_de · November 18, 2009

Be pattient.

Apple will for sure fix that problems with OpenCL until spring 2010.

Even after 3+ months of 10.6 there is NO Application out which needs/uses OpenCL.

Also Apple didnt use OpenCL in any of its own Apps (sure, it would be an failture if they had did that).

Upcoming (Spring 2010++) newer versions of iTunes , iMovie, iDVD, FCP, Logic,.... will have OpenCl speedups!

So all problems didnt hurt really, if only demos+benches wil not work on your gpu.

I will update the benches soon with newer versions (updated Apple OpenCL demos).

All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites