Jump to content

All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...


mitch_de
 Share

125 posts in this topic

Recommended Posts

I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL

 

My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.

 

The 4870 isn't well supported for OpenCL, I've got a Mac Pro '06 and it performs almost the same as my CPU and most of the times it crashes. As Mitch had mentioned, we'd have to wait on the Apple ATI dev team to fix these problems.

 

Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.

Link to comment
Share on other sites

Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.

 

right

isnt FC still 32 bit , uses only 2GB max of memory ???

 

come on Randy Ubillos and the FCteam ...

 

 

but some interesting news :

 

AMD /ATI seems to be really dedicated to Open CL

 

As of today AMD released a press info that the company is awaiting Open Cl certification from the

Khronos Working Group .

8th of August ATI released a beta SDK for x86-basied CPUs (certified by Khronos September 3 th)

 

and ATI Stream SDK v2.0 will be ready this year (project book + = source forge)

 

as of today (09/23)we will see the new DirectX - 11 cards (RV 870) (support for Directcompute)

 

From a technical point of view these new cards with 40nm seem to be lightyears in front of their Nvidia counterparts.

 

We all know as well , that from a historical point of view, this was always the case in the last years

but we obviously very often had to deal with a very weak ATI driver support too.

 

Lets hope that Apple is aware of this new situation soon and give us some alternatives to the green camp.

 

As a HTPC user , only red cards will find their way into my rig.

 

Best

as

Link to comment
Share on other sites

New Galaxies OpenCL Bench V2:

- Apple updated / fixed some OpenCL API usage (maybe help ATI)

- little speed up (10% on my GT 9600)

 

Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.

If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.

 

DL Links on 1 post

Link to comment
Share on other sites

New Galaxies OpenCL Bench V2:

- Apple updated / fixed some OpenCL API usage (maybe help ATI)

- little speed up (10% on my GT 9600)

 

Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.

If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.

 

DL Links on 1 post

 

Very nice Mitch_de

 

Results 32k v2.0 at 1680x1050

 

Sim Vector

 

S-core cpu 10 G

M-core cpu 18 G

9600gt gpu 149 G

Hybrid M&G 34 G

 

 

Thnx

T.

Link to comment
Share on other sites

Thanks !

Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

Link to comment
Share on other sites

q8200(quad core)/8600GT - 32gflogs :)

8600GT - 48gflops --- okay? how is the hybrid approach slower?

 

Hybrid is slower as GPU only (and sometimes also CPU only) because of much more syncing + data transfers time needed between CPU + GPU as with CPU alone or GPU alone.

OpenCL bootleneck is the very slow PCIe Datatransfer, compared to CPU - Main Memory datatransferspeed. 2-5 GB/sec PCIe vs upto 50 GB/Sec CPU-L2/L3-Memory. GPU itself also has very fast memory access : up to 160 GB/sec. But getting thr data to gpu and reading it back from is the problem (on fast GPUs ;)

So PCIe bandwidth limits OpenCL (and CUDA) overallperformance benefit .

Some tests shows that transferspeed to and from GPU may use 80% of overalltime ! So GPU computes very fast but the time to get data to and from gpu can be the bottleneck.

 

For example an MacPro 2009 may get higher GigaFlops CPU only than with an GT120 GPU. Reason : GPU to slow + PCIe Transfertimes

Same GPU on an lowend C2D System is much faster to the C2D CPU only.

 

The PCIe transferspeed also is on "problem" for CoreImage.

This is an reason why in the past, as CI was first used on AGP Macs, CI had bad peformance and got a bit "lost".

AGP bandwith is ver, very bad in the direction from GPU to CPU - less than 250 MB/s. Other direction CPU>GPU (normal gaming way) up to 1 GB/s.

So they made PCIe which was much better but i think, because of upcoming very fast 5870 + GT 300 in the next 2 years they need to update PCIe again to faster speed .

Link to comment
Share on other sites

Thanks !

Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

 

not yet, will test this later

just finished a perfect retail 100% working snow install (incl auto-sleep & keyboard/mouse wake) :blink:

 

T.

Link to comment
Share on other sites

Thanks !

Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

. . Rapidshare link for the 8K_V2 doesn't work.

 

Edit: Sorry - link works fine - my ISP has just now started to block Rapidshare

Link to comment
Share on other sites

. . Rapidshare link for the 8K_V2 doesn't work.

 

(tried 2 browsers, & jdownloader: my ISP doesn't block Rapidshare)

 

Am trying to bench an Atom330 ION MCP79/7A motherboard so only this version likely to run at all . . would appreciate if some kind soul would mirror this version to a working link.

 

TIA

 

 

Galaxies now work in ATI 4870 with 10.6.2 seed

 

http://netkas.org/?p=240

Link to comment
Share on other sites

Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]

 

CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.

 

. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini

Link to comment
Share on other sites

Awesome post!

 

Galaxies 32k running on 1900x1200 in SL 10.6.1

i7 920 @ 3.4GHz

GTX 260

 

Vector Single Core CPU: 14

Vector Multi Core: 57 (that's what I call proper multi-core scaling!)

GPU: 275! Came up from 180 with the 8k benchmark!

CPU+GPU: 95

 

I really love the 260's performance. For that pricepoint (got for €140) it really shines.

Link to comment
Share on other sites

Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]

 

CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.

 

. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini

 

MacBook 2.0GHZ Aluminium 4GB DDR3 RAM

 

Galaxies 8K V2

GPU 9400M mode 20 Gflops/ 15 Updates

Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)

 

Mac OS X 10.6.1

Link to comment
Share on other sites

MacBook 2.0GHZ Aluminium 4GB DDR3 RAM

 

Galaxies 8K V2

GPU 9400M mode 20 Gflops/ 15 Updates

Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)

 

Mac OS X 10.6.1

. . thank you *very* much for posting this: I knew the GPU & Shader clocks were supposed to be the same on the MCP7x variants; but nice to have it confirmed.

 

Surprised the Macbook's DDR3 didn't make a solid difference tho' . . . . .

Link to comment
Share on other sites

Hmm on my 2008 Mac Pro 2.8 Ghz 8-core with 285 GTX running the 32K galaxies V2 under 10.6.1:

CPU: 11

multi core: 88

GPU: 329

hybrid: 123

Link to comment
Share on other sites

  • 3 weeks later...

Is there a way to run the other benchmark in a hackintosh?

I just can run Galaxy, (by the way i get 30G with my Athlon x4 720)

 

I want to run displacement but says "bad cpu type in executable" logout

I have a 9400gt

 

am i doing something wrong??

Link to comment
Share on other sites

  • 2 weeks later...

Hi

 

I've got success with ATI HD4850 Gainward GS 512 under OSX 10.6.2 in my HAckintosh

 

all test with 1920x1080x32x60hz LCD HD monitor

Displacement:43 fps.

Galaxies 32K V2.0 and 8k V2.0 did work !

 

when i toggle with the S key i get:

2;13;47;69;52 Gflops

 

OpenCL Bench V 0.20 by mitch

 

....CL_DEVICE_NAME: Intel® Core i7 CPU 920 @ 2.67GHz .....

CL_DEVICE_VENDOR: Intel

CL_DEVICE_MAX_CLOCK_FREQUENCY: 3096 MHz

CL_DEVICE_MAX_COMPUTE_UNITS: 8

Now computing - please be patient....

time used: 9.933120

Number of elements computed: 2097152

....CL_DEVICE_NAME: Radeon HD 4870 .....

CL_DEVICE_VENDOR: AMD

CL_DEVICE_MAX_CLOCK_FREQUENCY: 750 MHz

CL_DEVICE_MAX_COMPUTE_UNITS: 10

Now computing - please be patient....

time used: 16.656227

Number of elements computed: 2097152

Now checking if results are valid - please be patient....

;) Validate results test passed - GPU=CPU :P

logout

 

And: Transpose bandwith test

Tests/Open\ CL/OpenCL\ Tranpose\ Bandwidhttest/transpose

Performing Matrix Transpose [256 x 4096]...

Bandwidth Achieved = 2.755923 GB/sec

Results Validated!

 

:unsure:

Link to comment
Share on other sites

Be pattient.

Apple will for sure fix that problems with OpenCL until spring 2010.

Even after 3+ months of 10.6 there is NO Application out which needs/uses OpenCL.

Also Apple didnt use OpenCL in any of its own Apps (sure, it would be an failture if they had did that).

Upcoming (Spring 2010++) newer versions of iTunes , iMovie, iDVD, FCP, Logic,.... will have OpenCl speedups!

 

So all problems didnt hurt really, if only demos+benches wil not work on your gpu.

 

I will update the benches soon with newer versions (updated Apple OpenCL demos).

Link to comment
Share on other sites

 Share

×
×
  • Create New...