All OpenCL Benches: RAYTRACING/Galaxies/Grass/qJulia/Displacement...
Started by mitch_de, Aug 30 2009 07:08 PM
124 replies to this topic
#21
Posted 05 September 2009 - 12:39 AM
Well, a small update.
As of the new version, running SSE4 and no VSync, I'm getting:
70 Updates/s and 23 Gflops
As of the new version, running SSE4 and no VSync, I'm getting:
70 Updates/s and 23 Gflops
#22
Posted 07 September 2009 - 03:19 PM
UPDATE:
ATI devs give me an hint to increase the count from 4K to 16K.
CPU GigaFlops will stay the same.
But GPU can show more performance, because in "only" 4K (count of things to compute) the faster GPUs like GTX285 ar not on the limit !
Even my 9600 GT go from 97 Giga(4K) to 112 Giga(16K) - the CPU cant compute more (white flag
, so 16K CPU Giga = 4K CPU Giga!!
New Version shows 16K in the result legends, is also no vsync & SSE4 optimized
DL Link on 1 post
ATI devs give me an hint to increase the count from 4K to 16K.
CPU GigaFlops will stay the same.
But GPU can show more performance, because in "only" 4K (count of things to compute) the faster GPUs like GTX285 ar not on the limit !
Even my 9600 GT go from 97 Giga(4K) to 112 Giga(16K) - the CPU cant compute more (white flag
New Version shows 16K in the result legends, is also no vsync & SSE4 optimized
DL Link on 1 post
#23
Posted 10 September 2009 - 05:47 AM
1. i920 overclocked to 4.4GHZ - 87 Gigaflops
2. eVGA GTX-285 (100MHz pcie) - 306 Gigaflops
#24
Posted 16 September 2009 - 06:12 PM
Not sure if this is the correct topic / thread
Former Apple Final Cut Pro engineer,
Very interested in OpenCL and Final Cut Studio 3, Compressor specifically.
I have done a bunch of tests on Compressor 3.5 / Qmaster and Leopard 10.5.8 and the new Mac Pro Nehalem's.
I have tested 14 Instances (Cores) in Qmaster, Compressor 3.5 and Submitted a 40 minute DVCPRO HD clip to be batch / parallel converted to H.264 in 5 separate queues.
It took only 1 hour to do a 40 minute clip, this is pretty good to output 5 separate clips.
Now Snow Leopard of course has Grand Central to make this even better, and I will eventually test that.
My main interest is to test OpenCL, and OpenCL specifically with Compressor/ FCP
I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL
My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.
Or I will have to get to my colleagues new Nehalem MacPro with dual nVidia GT 120's, although it seems from Barefeats that the GT 120 is the weakest OpenCL card
Very interested in developing this thread / conversation along these lines of Compressor 3.5 and Final Cut Pro 7 and OpenCL, it could be awesome stuff saving a ton of time.
thanks in advance
Former Apple Final Cut Pro engineer,
Very interested in OpenCL and Final Cut Studio 3, Compressor specifically.
I have done a bunch of tests on Compressor 3.5 / Qmaster and Leopard 10.5.8 and the new Mac Pro Nehalem's.
I have tested 14 Instances (Cores) in Qmaster, Compressor 3.5 and Submitted a 40 minute DVCPRO HD clip to be batch / parallel converted to H.264 in 5 separate queues.
It took only 1 hour to do a 40 minute clip, this is pretty good to output 5 separate clips.
Now Snow Leopard of course has Grand Central to make this even better, and I will eventually test that.
My main interest is to test OpenCL, and OpenCL specifically with Compressor/ FCP
I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL
My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.
Or I will have to get to my colleagues new Nehalem MacPro with dual nVidia GT 120's, although it seems from Barefeats that the GT 120 is the weakest OpenCL card
Very interested in developing this thread / conversation along these lines of Compressor 3.5 and Final Cut Pro 7 and OpenCL, it could be awesome stuff saving a ton of time.
thanks in advance
#25
Posted 16 September 2009 - 08:51 PM
ATI Apple Dev told me that they didnt reached time limit for 10.6.1 update to fix OpenCl on ATI GPUs.
They will fix that as soon as possible, but that can be not before 10.6.2 (some weeks to wait).
OpenCl only works with Nvidia GPUs today.
And also OpenCl will only get "into work" if an application uses the OpenCL framework - so only new develepoed apps will use OpenCL. "Old" Apps, meas your already installed apps not.
They will fix that as soon as possible, but that can be not before 10.6.2 (some weeks to wait).
OpenCl only works with Nvidia GPUs today.
And also OpenCl will only get "into work" if an application uses the OpenCL framework - so only new develepoed apps will use OpenCL. "Old" Apps, meas your already installed apps not.
#26
Posted 17 September 2009 - 11:59 PM
macguitarm, on Sep 17 2009, 04:12 AM, said:
I am still wondering if Compressor 3.5 / Final Cut Pro 7 has been written to take advantage of OpenCL
My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.
My MacPro is a 2006 MacPro, so I can not test it, I suppose I could do a Barefeats deal and get a Radeon ATI 4870 and test it.
The 4870 isn't well supported for OpenCL, I've got a Mac Pro '06 and it performs almost the same as my CPU and most of the times it crashes. As Mitch had mentioned, we'd have to wait on the Apple ATI dev team to fix these problems.
Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.
#27
Posted 23 September 2009 - 06:13 AM
wesux, on Sep 17 2009, 11:59 PM, said:
Now on the topic of FCS, none of the applications go out of the way to support OpenCL or GCD so don't expect any speed gains using these applications. Video encoding to GPUs is still in a relatively primitive stage, most encoders don't support two pass/ multi passing, motion prediction or much of the nice stuff that gets your video smooth. I believe this is why Apple released FCS before Snow Leopard, they couldn't reach a deadline with an acceptable OpenCL compliant encoder. Now if this is patched up for current users in months that would be fantastic but in my opinion, so take it with a grain of salt, is highly unlikely.
right
isnt FC still 32 bit , uses only 2GB max of memory ???
come on Randy Ubillos and the FCteam ...
but some interesting news :
AMD /ATI seems to be really dedicated to Open CL
As of today AMD released a press info that the company is awaiting Open Cl certification from the
Khronos Working Group .
8th of August ATI released a beta SDK for x86-basied CPUs (certified by Khronos September 3 th)
and ATI Stream SDK v2.0 will be ready this year (project book + = source forge)
as of today (09/23)we will see the new DirectX - 11 cards (RV 870) (support for Directcompute)
From a technical point of view these new cards with 40nm seem to be lightyears in front of their Nvidia counterparts.
We all know as well , that from a historical point of view, this was always the case in the last years
but we obviously very often had to deal with a very weak ATI driver support too.
Lets hope that Apple is aware of this new situation soon and give us some alternatives to the green camp.
As a HTPC user , only red cards will find their way into my rig.
Best
as
#28
Posted 29 September 2009 - 06:02 AM
New Galaxies OpenCL Bench V2:
- Apple updated / fixed some OpenCL API usage (maybe help ATI)
- little speed up (10% on my GT 9600)
Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.
If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.
DL Links on 1 post
- Apple updated / fixed some OpenCL API usage (maybe help ATI)
- little speed up (10% on my GT 9600)
Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.
If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.
DL Links on 1 post
#29
Posted 29 September 2009 - 11:54 AM
mitch_de, on Sep 29 2009, 08:02 AM, said:
New Galaxies OpenCL Bench V2:
- Apple updated / fixed some OpenCL API usage (maybe help ATI)
- little speed up (10% on my GT 9600)
Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.
If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.
DL Links on 1 post
- Apple updated / fixed some OpenCL API usage (maybe help ATI)
- little speed up (10% on my GT 9600)
Now i build an 32K and an 8K Version - 32K use for fast/highend GPU/CPU and 8K for lowend CPU/GPUs.
If GPU limits, there will be no difference in GPU Gigaflops. But on very fast GPUs 32K may give much higher GPU Gigaflops - more GPU load=less waste of OpenCL overhead time.
DL Links on 1 post
Very nice Mitch_de
Results 32k v2.0 at 1680x1050
Sim Vector
S-core cpu 10 G
M-core cpu 18 G
9600gt gpu 149 G
Hybrid M&G 34 G
Thnx
T.
#30
Posted 29 September 2009 - 04:54 PM
Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.
#31
Posted 30 September 2009 - 06:05 AM
Mirror for galaxies, I hate rapidshare with a passion.
http://victori.uploa...xies-32k-v2.zip
32k version.
q8200(quad core)/8600GT - 32gflogs
8600GT - 48gflops --- okay? how is the hybrid approach slower?
http://victori.uploa...xies-32k-v2.zip
32k version.
q8200(quad core)/8600GT - 32gflogs
8600GT - 48gflops --- okay? how is the hybrid approach slower?
#32
Posted 30 September 2009 - 10:12 AM
lamer0, on Sep 30 2009, 08:05 AM, said:
q8200(quad core)/8600GT - 32gflogs
8600GT - 48gflops --- okay? how is the hybrid approach slower?
8600GT - 48gflops --- okay? how is the hybrid approach slower?
Hybrid is slower as GPU only (and sometimes also CPU only) because of much more syncing + data transfers time needed between CPU + GPU as with CPU alone or GPU alone.
OpenCL bootleneck is the very slow PCIe Datatransfer, compared to CPU - Main Memory datatransferspeed. 2-5 GB/sec PCIe vs upto 50 GB/Sec CPU-L2/L3-Memory. GPU itself also has very fast memory access : up to 160 GB/sec. But getting thr data to gpu and reading it back from is the problem (on fast GPUs
So PCIe bandwidth limits OpenCL (and CUDA) overallperformance benefit .
Some tests shows that transferspeed to and from GPU may use 80% of overalltime ! So GPU computes very fast but the time to get data to and from gpu can be the bottleneck.
For example an MacPro 2009 may get higher GigaFlops CPU only than with an GT120 GPU. Reason : GPU to slow + PCIe Transfertimes
Same GPU on an lowend C2D System is much faster to the C2D CPU only.
The PCIe transferspeed also is on "problem" for CoreImage.
This is an reason why in the past, as CI was first used on AGP Macs, CI had bad peformance and got a bit "lost".
AGP bandwith is ver, very bad in the direction from GPU to CPU - less than 250 MB/s. Other direction CPU>GPU (normal gaming way) up to 1 GB/s.
So they made PCIe which was much better but i think, because of upcoming very fast 5870 + GT 300 in the next 2 years they need to update PCIe again to faster speed .
#33
Posted 30 September 2009 - 06:14 PM
mitch_de, on Sep 29 2009, 06:54 PM, said:
Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.
not yet, will test this later
just finished a perfect retail 100% working snow install (incl auto-sleep & keyboard/mouse wake)
T.
#34
Posted 06 October 2009 - 09:31 AM
mitch_de, on Sep 29 2009, 05:54 PM, said:
Thanks !
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.
Have you tried also the 8K Version ? 32K star compute is very heavy work for all C2D CPUs, so very less star moving seen in CPU Mode (most C2D get less than 1 FPS/sec in CPU Mode). 8K Version will give less GPU Gigaflops but shows extrem fast star moving compared to the CPU star moving.
Edit: Sorry - link works fine - my ISP has just now started to block Rapidshare
#35
Posted 06 October 2009 - 01:37 PM
shoarthing, on Oct 6 2009, 09:31 AM, said:
. . Rapidshare link for the 8K_V2 doesn't work.
(tried 2 browsers, & jdownloader: my ISP doesn't block Rapidshare)
Am trying to bench an Atom330 ION MCP79/7A motherboard so only this version likely to run at all . . would appreciate if some kind soul would mirror this version to a working link.
TIA
(tried 2 browsers, & jdownloader: my ISP doesn't block Rapidshare)
Am trying to bench an Atom330 ION MCP79/7A motherboard so only this version likely to run at all . . would appreciate if some kind soul would mirror this version to a working link.
TIA
Galaxies now work in ATI 4870 with 10.6.2 seed
http://netkas.org/?p=240
#36
Posted 07 October 2009 - 06:49 AM
Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]
CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.
. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini
CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.
. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini
#37
Posted 07 October 2009 - 01:22 PM
Awesome post!
Galaxies 32k running on 1900x1200 in SL 10.6.1
i7 920 @ 3.4GHz
GTX 260
Vector Single Core CPU: 14
Vector Multi Core: 57 (that's what I call proper multi-core scaling!)
GPU: 275! Came up from 180 with the 8k benchmark!
CPU+GPU: 95
I really love the 260's performance. For that pricepoint (got for €140) it really shines.
Galaxies 32k running on 1900x1200 in SL 10.6.1
i7 920 @ 3.4GHz
GTX 260
Vector Single Core CPU: 14
Vector Multi Core: 57 (that's what I call proper multi-core scaling!)
GPU: 275! Came up from 180 with the 8k benchmark!
CPU+GPU: 95
I really love the 260's performance. For that pricepoint (got for €140) it really shines.
#38
Posted 09 October 2009 - 08:44 AM
Hi,
I added a table for better comparison of the OpenCL benchmarks:
http://wiki.osx86pro...ndex.php/OpenCL
What would be the best benchmark for evaluation of OpenCL performance? Galaxies?
@mitch_de:
could you provide in this standard benchmark a build number visible while benchmarking (better comparison)?
Thanks
Jason
I added a table for better comparison of the OpenCL benchmarks:
http://wiki.osx86pro...ndex.php/OpenCL
What would be the best benchmark for evaluation of OpenCL performance? Galaxies?
@mitch_de:
could you provide in this standard benchmark a build number visible while benchmarking (better comparison)?
Thanks
Jason
#39
Posted 10 October 2009 - 01:08 PM
shoarthing, on Oct 7 2009, 02:49 PM, said:
Zotac ION-ITX with its integrated 9400M alone managed 20 Gflops/15 updates - using 8K V2 version [obv same Gflops w/ 32K version but v low updates]
CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.
. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini
CPU [Atom330]+GPU 3~4 updates & 4~5 Gflops.
. . . v interested how this MCP7A w/ DDR2 compares with a current MCP79 & DDR3 9400M Macbook/Mac mini
MacBook 2.0GHZ Aluminium 4GB DDR3 RAM
Galaxies 8K V2
GPU 9400M mode 20 Gflops/ 15 Updates
Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)
Mac OS X 10.6.1
#40
Posted 10 October 2009 - 02:16 PM
nvidia2008, on Oct 10 2009, 02:08 PM, said:
MacBook 2.0GHZ Aluminium 4GB DDR3 RAM
Galaxies 8K V2
GPU 9400M mode 20 Gflops/ 15 Updates
Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)
Mac OS X 10.6.1
Galaxies 8K V2
GPU 9400M mode 20 Gflops/ 15 Updates
Hybrid mode CPU+GPU 5 Gflops/ 60 Updates (crashed quite a lot but when working I took this measurement)
Mac OS X 10.6.1
Surprised the Macbook's DDR3 didn't make a solid difference tho' . . . . .
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users



Sign In
Create Account








