Jump to content

All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...


  • Please log in to reply
124 replies to this topic

#21
JBeed

JBeed

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 4 posts
Well, a small update.
As of the new version, running SSE4 and no VSync, I'm getting:
70 Updates/s and 23 Gflops

#22
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,902 posts
  • Gender:Male
  • Location:Stuttgart / Germany
UPDATE:
ATI devs give me an hint to increase the count from 4K to 16K.
CPU GigaFlops will stay the same.
But GPU can show more performance, because in "only" 4K (count of things to compute) the faster GPUs like GTX285 ar not on the limit !

Even my 9600 GT go from 97 Giga(4K) to 112 Giga(16K) - the CPU cant compute more (white flag :) , so 16K CPU Giga = 4K CPU Giga!!

New Version shows 16K in the result legends, is also no vsync & SSE4 optimized
DL Link on 1 post

#23
proengin

proengin

    InsanelyMac Protégé

  • Members
  • Pip
  • 13 posts
Here are my OpenCL_GALAXIES_16K_SSE4_VSYNC_OFF benchmarks for 1280 x 1024 pixels:

1. i920 overclocked to 4.4GHZ - 87 Gigaflops
2. eVGA GTX-285 (100MHz pcie) - 306 Gigaflops

#24
macguitarm

macguitarm

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 2 posts
Not sure if this is the correct topic / thread

Former Apple Final Cut Pro engineer,

Very interested in OpenCL and Final Cut Studio 3, Compressor specifically.

I have done a bunch of tests on Compressor 3.5 / Qmaster and Leopard 10.5.8 and the new Mac Pro Nehalem's.

I have tested 14 Instances (Cores) in Qmaster, Compressor 3.5 and Submitted a 40 minute DVCPRO HD clip to be batch / parallel converted to H.264 in 5 separate queues.

It took only 1 hour to do a 40 minute clip, this is pretty good to output 5 separate clips.

Now Snow Leopard of course has Grand Central to make this even better, and I will eventually test that.

My main interest is to test OpenCL, and OpenCL specifically with Compressor/ FCP

I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL

My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.

Or I will have to get to my colleagues new Nehalem MacPro with dual nVidia GT 120's, although it seems from Barefeats that the GT 120 is the weakest OpenCL card

Very interested in developing this thread / conversation along these lines of Compressor 3.5 and Final Cut Pro 7 and OpenCL, it could be awesome stuff saving a ton of time.

thanks in advance

#25
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,902 posts
  • Gender:Male
  • Location:Stuttgart / Germany
ATI Apple Dev told me that they didnt reached time limit for 10.6.1 update to fix OpenCl on ATI GPUs.
They will fix that as soon as possible, but that can be not before 10.6.2 (some weeks to wait).
OpenCl only works with Nvidia GPUs today.
And also OpenCl will only get "into work" if an application uses the OpenCL framework - so only new develepoed apps will use OpenCL. "Old" Apps, meas your already installed apps not.

#26
wesux

wesux

    InsanelyMac Protégé

  • Members
  • Pip
  • 7 posts

I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL

My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.


The 4870 isn't well supported for OpenCL, I've got a Mac Pro '06 and it performs almost the same as my CPU and most of the times it crashes. As Mitch had mentioned, we'd have to wait on the Apple ATI dev team to fix these problems.

Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.

#27
sch8mid

sch8mid

    Lenz

  • Members
  • PipPipPipPip
  • 224 posts
  • Gender:Male
  • Location:Germany

Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.


right
isnt FC still 32 bit , uses only 2GB max of memory ???

come on Randy Ubillos and the FCteam ...


but some interesting news :

AMD /ATI seems to be really dedicated to Open CL

As of today AMD released a press info that the company is awaiting Open Cl certification from the
Khronos Working Group .
8th of August ATI released a beta SDK for x86-basied CPUs (certified by Khronos September 3 th)

and ATI Stream SDK v2.0 will be ready this year (project book + = source forge)

as of today (09/23)we will see the new DirectX - 11 cards (RV 870) (support for Directcompute)

From a technical point of view these new cards with 40nm seem to be lightyears in front of their Nvidia counterparts.

We all know as well , that from a historical point of view, this was always the case in the last years
but we obviously very often had to deal with a very weak ATI driver support too.

Lets hope that Apple is aware of this new situation soon and give us some alternatives to the green camp.

As a HTPC user , only red cards will find their way into my rig.

Best
as

#28
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,902 posts
  • Gender:Male
  • Location:Stuttgart / Germany
New Galaxies OpenCL Bench V2:
- Apple updated / fixed some OpenCL API usage (maybe help ATI)
- little speed up (10% on my GT 9600)

Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.
If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.

DL Links on 1 post

#29
tinush

tinush

    InsanelyMac Sage

  • Members
  • PipPipPipPipPip
  • 262 posts
  • Location:Amsterdam

New Galaxies OpenCL Bench V2:
- Apple updated / fixed some OpenCL API usage (maybe help ATI)
- little speed up (10% on my GT 9600)

Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.
If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.

DL Links on 1 post


Very nice Mitch_de

Results 32k v2.0 at 1680x1050

Sim Vector

S-core cpu 10 G
M-core cpu 18 G
9600gt gpu 149 G
Hybrid M&G 34 G


Thnx
T.

#30
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,902 posts
  • Gender:Male
  • Location:Stuttgart / Germany
Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

#31
lamer0

lamer0

    InsanelyMac Protégé

  • Members
  • PipPip
  • 95 posts
Mirror for galaxies, I hate rapidshare with a passion.

http://victori.uploa...xies-32k-v2.zip

32k version.

q8200(quad core)/8600GT - 32gflogs :)
8600GT - 48gflops --- okay? how is the hybrid approach slower?

#32
mitch_de

mitch_de

    InsanelyMacaholic

  • Retired
  • 2,902 posts
  • Gender:Male
  • Location:Stuttgart / Germany

q8200(quad core)/8600GT - 32gflogs :)
8600GT - 48gflops --- okay? how is the hybrid approach slower?


Hybrid is slower as GPU only (and sometimes also CPU only) because of much more syncing + data transfers time needed between CPU + GPU as with CPU alone or GPU alone.
OpenCL bootleneck is the very slow PCIe Datatransfer, compared to CPU - Main Memory datatransferspeed. 2-5 GB/sec PCIe vs upto 50 GB/Sec CPU-L2/L3-Memory. GPU itself also has very fast memory access : up to 160 GB/sec. But getting thr data to gpu and reading it back from is the problem (on fast GPUs ;)
So PCIe bandwidth limits OpenCL (and CUDA) overallperformance benefit .
Some tests shows that transferspeed to and from GPU may use 80% of overalltime ! So GPU computes very fast but the time to get data to and from gpu can be the bottleneck.

For example an MacPro 2009 may get higher GigaFlops CPU only than with an GT120 GPU. Reason : GPU to slow + PCIe Transfertimes
Same GPU on an lowend C2D System is much faster to the C2D CPU only.

The PCIe transferspeed also is on "problem" for CoreImage.
This is an reason why in the past, as CI was first used on AGP Macs, CI had bad peformance and got a bit "lost".
AGP bandwith is ver, very bad in the direction from GPU to CPU - less than 250 MB/s. Other direction CPU>GPU (normal gaming way) up to 1 GB/s.
So they made PCIe which was much better but i think, because of upcoming very fast 5870 + GT 300 in the next 2 years they need to update PCIe again to faster speed .


#33
tinush

tinush

    InsanelyMac Sage

  • Members
  • PipPipPipPipPip
  • 262 posts
  • Location:Amsterdam

Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.


not yet, will test this later
just finished a perfect retail 100% working snow install (incl auto-sleep & keyboard/mouse wake) :blink:

T.

#34
shoarthing

shoarthing

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 846 posts
  • Location:Blighty

Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.

. . Rapidshare link for the 8K_V2 doesn't work.

Edit: Sorry - link works fine - my ISP has just now started to block Rapidshare

#35
osssua

osssua

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 2 posts

. . Rapidshare link for the 8K_V2 doesn't work.

(tried 2 browsers, & jdownloader: my ISP doesn't block Rapidshare)

Am trying to bench an Atom330 ION MCP79/7A motherboard so only this version likely to run at all . . would appreciate if some kind soul would mirror this version to a working link.

TIA



Galaxies now work in ATI 4870 with 10.6.2 seed

http://netkas.org/?p=240

#36
shoarthing

shoarthing

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 846 posts
  • Location:Blighty
Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]

CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.

. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini

#37
Schenkenberg

Schenkenberg

    InsanelyMac Protégé

  • Members
  • Pip
  • 16 posts
Awesome post!

Galaxies 32k running on 1900x1200 in SL 10.6.1
i7 920 @ 3.4GHz
GTX 260

Vector Single Core CPU: 14
Vector Multi Core: 57 (that's what I call proper multi-core scaling!)
GPU: 275! Came up from 180 with the 8k benchmark!
CPU+GPU: 95

I really love the 260's performance. For that pricepoint (got for €140) it really shines.

#38
n00b32

n00b32

    InsanelyMac Protégé

  • Members
  • Pip
  • 10 posts
Hi,

I added a table for better comparison of the OpenCL benchmarks:

http://wiki.osx86pro...ndex.php/OpenCL

What would be the best benchmark for evaluation of OpenCL performance? Galaxies?

@mitch_de:

could you provide in this standard benchmark a build number visible while benchmarking (better comparison)?

Thanks

Jason

#39
nvidia2008

nvidia2008

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 2 posts

Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]

CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.

. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini


MacBook 2.0GHZ Aluminium 4GB DDR3 RAM

Galaxies 8K V2
GPU 9400M mode 20 Gflops/ 15 Updates
Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)

Mac OS X 10.6.1

#40
shoarthing

shoarthing

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 846 posts
  • Location:Blighty

MacBook 2.0GHZ Aluminium 4GB DDR3 RAM

Galaxies 8K V2
GPU 9400M mode 20 Gflops/ 15 Updates
Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)

Mac OS X 10.6.1

. . thank you *very* much for posting this: I knew the GPU & Shader clocks were supposed to be the same on the MCP7x variants; but nice to have it confirmed.

Surprised the Macbook's DDR3 didn't make a solid difference tho' . . . . .





2 user(s) are reading this topic

0 members, 2 guests, 0 anonymous users

© 2014 InsanelyMac  |   News  |   Forum  |   Downloads  |   OSx86 Wiki  |   Mac Netbook  |   PHP hosting by CatN  |   Designed by Ed Gain  |   Logo by irfan  |   Privacy Policy