Jump to content

OpenCL Benchmark - CPU vs GPU / DO NOT USE ANYMORE !


  • Please log in to reply
100 replies to this topic

#41
nofearl

nofearl

    InsanelyMac Protégé

  • Members
  • PipPip
  • 71 posts

THANKS !
Question: Do you have 2 indentical GPUs´?
Also (will not change the times) always use the lastest BUILD, ist V020.
Has much error handling code for ATI users (NVIDIAs seems to run without errors so far :thumbsup_anim: )


yep 2 palit nvidia 9600 gt on 2 pciex x16 ports

#42
El.Pilote

El.Pilote

    InsanelyMac Protégé

  • Members
  • Pip
  • 42 posts
Hi all here is mine :

Posted Image

:thumbsup_anim:

#43
grue

grue

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 3 posts
Doesn't seem to test my setup correctly.

CL_DEVICE_NAME: Intel® Xeon® CPU X5365 @ 3.00GHz
CL_DEVICE_VENDOR: Intel
Now computing - please be patient....
time used: 7.710562
Number of elements computed: 2097152
CL_DEVICE_NAME: GeForce 8800 GT
CL_DEVICE_VENDOR: NVIDIA
Now computing - please be patient....
time used: 2.492461
Number of elements computed: 2097152
CL_DEVICE_NAME: GeForce 8800 GT
CL_DEVICE_VENDOR: NVIDIA
Now computing - please be patient....
time used: 2.489143
Number of elements computed: 2097152
Now checking if results are valid - please be patient....
:thumbsup_anim: Validate results test passed - GPU=CPU :D



I have an 8800GT in Slot 3 working as the helper card to a GTX260. Looks like it's testing the 8800GT twice.

#44
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,884 posts
  • Gender:Male
  • Location:Stuttgart / Germany

Doesn't seem to test my setup correctly.


I have an 8800GT in Slot 3 working as the helper card to a GTX260. Looks like it's testing the 8800GT twice.



NEW VERSION on the road - DL V025, lots of changes (and hopefully fixed ATI + > 2 gpu tests)

#45
grue

grue

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 3 posts
BINGO


Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 8800 GT
Device 0 is an: GPU with max. 1500 MHz and 112 units/cores
Now computing - please be patient....
time used: 0.683 seconds

OpenCL Device # 1 = GeForce GTX 260
Device 1 is an: GPU with max. 1400 MHz and 216 units/cores
Now computing - please be patient....
time used: 0.365 seconds

OpenCL Device # 2 = Intel® Xeon® CPU X5365 @ 3.00GHz
Device 2 is an: CPU with max. 3000 MHz and 8 units/cores
Now computing - please be patient....
time used: 3.094 seconds

Now checking if results are valid - please be patient....
:thumbsup_anim: Validate test passed - GPU results=CPU results :D

#46
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,884 posts
  • Gender:Male
  • Location:Stuttgart / Germany

BINGO


Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 8800 GT
Device 0 is an: GPU with max. 1500 MHz and 112 units/cores
Now computing - please be patient....
time used: 0.683 seconds

OpenCL Device # 1 = GeForce GTX 260
Device 1 is an: GPU with max. 1400 MHz and 216 units/cores
Now computing - please be patient....
time used: 0.365 seconds

OpenCL Device # 2 = Intel® Xeon® CPU X5365 @ 3.00GHz
Device 2 is an: CPU with max. 3000 MHz and 8 units/cores
Now computing - please be patient....
time used: 3.094 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :)


YEAH !
Hope also ATI users will not get an freezed system anymore - their GPUs got overloaded with old code - NVIDIAs not :star_smile:

Remember : the V025 time used results of V025 cant be compared 1:1 with the old version ones.
Thats because of code changes for ATI users :wacko:

#47
proengin

proengin

    InsanelyMac Protégé

  • Members
  • Pip
  • 13 posts
Here are my scores from V0.25 script:

...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce GTX 285
Device 0 is an: GPU with max. 1584 MHz and 240 units/cores
Now computing - please be patient....
time used: 0.231 seconds

OpenCL Device # 1 = Intel® Core™ i7 CPU 920 @ 2.67GHz
Device 1 is an: CPU with max. 4280 MHz and 8 units/cores
Now computing - please be patient....
time used: 1.296 seconds

Now checking if results are valid - please be patient....
:rolleyes: Validate test passed - GPU results=CPU results :)

#48
gzfelix

gzfelix

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 1 posts
...........................................................

.................. OpenCL Bench V 0.25 by mitch ...........

...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......

... time results are not comparable to older version! .....

...........................................................



Number of OpenCL devices found: 3

OpenCL Device # 0 = Radeon HD 4870

Device 0 is an: GPU with max. 750 MHz and 4 units/cores 

Now computing - please be patient....

time used:  4.126 seconds



OpenCL Device # 1 = GeForce GT 120

Device 1 is an: GPU with max. 1400 MHz and 32 units/cores 

Error: clBuildProgram for device # 1 

ERROR NUMBER = -11


#49
vidkidd

vidkidd

    InsanelyMac Protégé

  • Members
  • Pip
  • 2 posts
This just shows how SAD the ATI Drivers currently are!!! OUCH!!!!


_tool-1/OpenCL2_Bench_V025 ; exit;
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = Radeon HD 4870
Device 0 is an: GPU with max. 750 MHz and 4 units/cores
Now computing - please be patient....
time used: 4.065 seconds

OpenCL Device # 1 = Intel® Xeon® CPU 5160 @ 3.00GHz
Device 1 is an: CPU with max. 3000 MHz and 4 units/cores
Now computing - please be patient....
time used: 6.079 seconds

Now checking if results are valid - please be patient....
:thumbsup_anim: Validate test passed - GPU results=CPU results :bag:
logout

[Process completed]

#50
real3x

real3x

    InsanelyMac Protégé

  • Members
  • Pip
  • 46 posts
  • Gender:Male
tool-1/OpenCL2_Bench_V025; exit;

...........................................................

.................. OpenCL Bench V 0.25 by mitch ...........

...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......

... time results are not comparable to older version! .....

...........................................................



Number of OpenCL devices found: 2

OpenCL Device # 0 = GeForce GTX 260

Device 0 is an: GPU with max. 1242 MHz and 192 units/cores 

Now computing - please be patient....

time used:  0.357 seconds



OpenCL Device # 1 = Intel(R) Core(TM)2 Duo CPU	 E8400  @ 3.00GHz

Device 1 is an: CPU with max. 3600 MHz and 2 units/cores 

Now computing - please be patient....

time used: 10.433 seconds



Now checking if results are valid - please be patient....

:) Validate test passed - GPU results=CPU results :) 

logout


#51
moondark

moondark

    InsanelyMac Protégé

  • Members
  • Pip
  • 25 posts
Hello mitch, can you make available the source code? Thanks!! :P


My results:
Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores
Now computing - please be patient....
time used: 2.798 seconds

OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores
Now computing - please be patient....
time used: 9.549 seconds

OpenCL Device # 2 = Intel® Core™2 Duo CPU P8600 @ 2.40GHz
Device 2 is an: CPU with max. 2400 MHz and 2 units/cores
Now computing - please be patient....
time used: 15.800 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results <_<

#52
VCH888

VCH888

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 585 posts
hi mitch_de

here is my results.
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = Radeon HD 4870
Device 0 is an: GPU with max. 750 MHz and 4 units/cores
Now computing - please be patient....
time used: 3.997 seconds

OpenCL Device # 1 = Intel® Core™2 Duo CPU E8400 @ 3.00GHz
Device 1 is an: CPU with max. 4000 MHz and 2 units/cores
Now computing - please be patient....
time used: 11.982 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :)

#53
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,884 posts
  • Gender:Male
  • Location:Stuttgart / Germany

smth like
for(i=0;i<5;i++)
for(loop....
should be enough to add just one line (and one for int i;)

Hi netkas & all other coding heros:
I made the SOURCE CODE is available (as xproject), would be fine to get an much better benchmark.
be free to change "all" and share your work !

http://freenet-homep...OpenCL2_SRC.zip

Latest openCL Specification / Manual
http://www.khronos.o...encl-1.0.43.pdf

#54
ricola

ricola

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 171 posts
  • Gender:Male
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 9400 GT
Device 0 is an: GPU with max. 1375 MHz and 16 units/cores
Now computing - please be patient....
time used: 3.992 seconds

OpenCL Device # 1 = Intel® Core™2 CPU E7500 @ 2.93GHz
Device 1 is an: CPU with max. 3666 MHz and 2 units/cores
Now computing - please be patient....
time used: 12.048 seconds

Now checking if results are valid - please be patient....
:wacko: Validate test passed - GPU results=CPU results :)

Thanks

#55
antic

antic

    InsanelyMac Geek

  • Members
  • PipPipPipPip
  • 229 posts
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 9500 GT
Device 0 is an: GPU with max. 1350 MHz and 32 units/cores
Now computing - please be patient....
time used: 3.053 seconds

OpenCL Device # 1 = Intel® Core™2 CPU 6600 @ 2.40GHz
Device 1 is an: CPU with max. 3800 MHz and 2 units/cores
Now computing - please be patient....
time used: 15.188 seconds

Now checking if results are valid - please be patient....
:| Validate test passed - GPU results=CPU results :D
logout

#56
johan

johan

    Nuke ATWT fan

  • Members
  • PipPipPipPipPipPip
  • 427 posts
  • Gender:Male
  • Location:The Netherlands

Here is my "updated" score from SL.

...........................................................
.................. OpenCL Bench V 0.15 by mitch ...........
...... C2D 3GHz = 30 sec vs Nvidia 9600GT = 3.10 sec ......
....... .......
........My test code (simple adds) is cpu friedly..........
.more gpu friedly+complexer code (raytracing/video encod.).
... may give much more speed advantage - at least on C2Ds .
...........................................................

....CL_DEVICE_NAME: Intel® Core™ i7 CPU 920 @ 2.67GHz .....
CL_DEVICE_VENDOR: Intel
CL_DEVICE_MAX_CLOCK_FREQUENCY: 4280 MHz
CL_DEVICE_MAX_COMPUTE_UNITS: 8
Now computing - please be patient....
time used: 3.834852
Number of elements computed: 2097152

....CL_DEVICE_NAME: GeForce GTX 285 .....
CL_DEVICE_VENDOR: NVIDIA
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1584 MHz
CL_DEVICE_MAX_COMPUTE_UNITS: 240
Now computing - please be patient....
time used: 0.861248
Number of elements computed: 2097152
Now checking if results are valid - please be patient....
:D Validate results test passed - GPU=CPU :)

This program seems to multi-thread very well according to SL's CPU Usage monitor.

8 units for the intel cpu?

how is that possible?

#57
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,884 posts
  • Gender:Male
  • Location:Stuttgart / Germany
i7 920 -
8 units for the intel cpu?
how is that possible?


i7 have 4 real cores and 4 virtuell cores = 8 cores :D

#58
Raul Ramos

Raul Ramos

    InsanelyMac Protégé

  • Members
  • Pip
  • 8 posts
  • Gender:Male
  • Location:Portugal
My time with a 8800GTS a bit OCed...

Last login: Thu Aug 27 17:55:56 on ttys000
/Users/raulmiguel/Downloads/OpenCLBench_as_terminal_tool/OpenCL2_Bench_V025; exit;
Raul-Miguels-Mac:~ raulmiguel$ /Users/raulmiguel/Downloads/OpenCLBench_as_terminal_tool/OpenCL2_Bench_V025; exit;
...........................................................
.................. OpenCL Bench V 0.25 by mitch ...........
...... C2D 3GHz = 12 sec vs Nvidia 9600GT = 0,93 sec ......
... time results are not comparable to older version! .....
...........................................................

Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 8800 GTS 512
Device 0 is an: GPU with max. 1750 MHz and 128 units/cores 
Now computing - please be patient....
time used:  0.621 seconds

OpenCL Device # 1 = Intel(R) Core(TM)2 Quad CPU	Q6600  @ 2.40GHz
Device 1 is an: CPU with max. 3800 MHz and 4 units/cores 
Now computing - please be patient....
time used:  5.866 seconds

Now checking if results are valid - please be patient....
:) Validate test passed - GPU results=CPU results :) 
logout

[Process completed]

cya

#59
oscarbg

oscarbg

    InsanelyMac Protégé

  • Just Joined
  • Pip
  • 2 posts
Please change __kernel void vectorAdd(

__global const float * a,

__global const float * b,

__global float * c)

{

// Vector element index

int loop;

int test1;

int nIndex = get_global_id(0);

for (loop=1; loop< 1000; loop++)

{

c[nIndex] = a[nIndex] + b[nIndex];

c[nIndex] = c[nIndex] * (a[nIndex] + b[nIndex]);

c[nIndex] = c[nIndex] * (a[nIndex] / 2.0 );

}

}

to use float4 sould use vector units on CPU (SSE) and ATI (improve perf 4x) also should imporve meory fetches on ATI GPUs (optimized for float4)..


If you are brave enough use float8 (hey it shold be ready for Sandy Bridge AVX extensions) or more brave enough to float16 (hey ready for Larrabe)..







and loop 250 times:

also

loop=1; loop< 1000;

only exectues 1000-1 times

__kernel void vectorAdd(

__global const float4 * a,

__global const float4 * b,

__global float4 * c)

{

// Vector element index

int loop;

int test1;

int nIndex = get_global_id(0);

for (loop=0; loop< 250; loop++)

{

c[nIndex] = a[nIndex] + b[nIndex];

c[nIndex] = c[nIndex] * (a[nIndex] + b[nIndex]);

c[nIndex] = c[nIndex] * (a[nIndex] / 2.0 );

}

}

:P
The DL link is at the end of that posting.
- Mac OS X 10.6 - Snow Leopard ONLY ! (will not run in 10.5 / 10.4....)

UPDATED OpenCL_BENCH to V025
V025 changes
- fixed bug with showing same cards twice in system > 2 gnu (hopefully fixed: )
- ATI compatible , reduced gnu code in size for an fix of the ATI problems with this bench
- therefor the time results cant be 100% compared to old versions - faster on CPU+GPU
GPU time new = GPU time old / 3,3
CPU time new = CPU time old / 2,5
So GPU part shows more advantages vs CPU with that version
- cleaned up the informations for better readability
- GPUs now shown + benched before CPU
- added error handling code


SOURCE CODE available - help to get ATI running well / make Bench much better

http://freenet-homep...OpenCL2_SRC.zip



OPENCL - Good to know :
- OpenCL is an API for universal GPU(CPU) computing
- main difference to CUDA / ATI STEAM is: both only working with their "own" gpu.
an CUDA (NV) app like badaboom(h264 on GPU) cant work on an ATI gpu and vice versa
- OpenCL is universal means:
- Xcode / GCC compiles an code which includes the source (in C as an string) for the gpu programm
that c source is , different to CUDA/ATI STEAM , is compiled later by OpenCL at runtime !
So same App can run on complete different gpus and also , without/less codechange om CPU if no
OpemCL gpu (newer ones) is found
The source (example below) for the gpu programm will be really compiled at runtime, not only interpreted.
So little differences between run of my bench may happen because of that compile on the run ;)


Results posted:
Conclusion until now:
The faster the GPU(exact the Units/programmpart) and slower the CPU the more useful is OpenCL.

NEW V025 test results !

ATIs (no freezes of the bench anymore :) , ATI 4870 works now):
Number of OpenCL devices found: 3
OpenCL Device # 0 = Radeon HD 4870
Device 0 is an: GPU with max. 750 MHz and 4 units/cores // 4 cores are wrong !!! //
Now computing - please be patient....
time used: 4.126 seconds

by mtonkol
Number of OpenCL devices found: 2
OpenCL Device # 0 = Radeon HD 4870
Device 0 is an: GPU with max. 750 MHz and 4 units/cores
time used: 3.997 seconds
At least with actual drivers and my benchmark ATI benches are useless.
Seems to be that either OpenCL isnt sooo universal (same code run on all GPUs optimized) or bugs in ATI OpenCL part. Maybe some OpenCL PRAGMA settings must set for ATI to get better performance.

NVIDIAs:
proengin is the HERO of the day
:superman:
Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce GTX 285
Device 0 is an: GPU with max. 1584 MHz and 240 units/cores
time used: 0.231 seconds
OpenCL Device # 1 = Intel® Core™ i7 CPU 920 4,3GHz
time used: 1.296 seconds

by grue:
Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 8800 GT
Device 0 is an: GPU with max. 1500 MHz and 112 units/cores
time used: 0.683 seconds
OpenCL Device # 1 = GeForce GTX 260
Device 1 is an: GPU with max. 1400 MHz and 216 units/cores
time used: 0.365 seconds
OpenCL Device # 2 = Intel® Xeon® CPU X5365 @ 3.00GHz
time used: 3.094 seconds

by moondark
Number of OpenCL devices found: 3
OpenCL Device # 0 = GeForce 9600M GT
Device 0 is an: GPU with max. 1250 MHz and 32 units/cores
time used: 2.798 seconds
OpenCL Device # 1 = GeForce 9400M
Device 1 is an: GPU with max. 1100 MHz and 16 units/cores
time used: 9.549 seconds
OpenCL Device # 2 = Intel® Core™2 Duo CPU P8600 @ 2.40GHz
time used: 15.800 seconds

by antic
Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 9500 GT
Device 0 is an: GPU with max. 1350 MHz and 32 units/cores
time used: 3.053 seconds
OpenCL Device # 1 = Intel® Core™2 CPU 6600 @ 3.80GHz
time used: 15.188 seconds

by ricola
Number of OpenCL devices found: 2
OpenCL Device # 0 = GeForce 9400 GT
Device 0 is an: GPU with max. 1375 MHz and 16 units/cores
time used: 3.992 seconds
OpenCL Device # 1 = Intel® Core™2 CPU E7500 @ 3,66 GHz
time used: 12.048 seconds


OLD results <=V020
by elitee : GeForce GTX 260 1242 MHz/216 Units = 1, 31 sec vs E8400_3.00GHz = 36 sec
27 times faster than CPU :)

by music-anderson (MacPro (Periode1) 4 Cores)
GeForce 8800 GT Mac (1500 MHz/112 Units) = 2,6 sec . vs Xeon® CPU 5150_2.66GHz = 16,8 sec

by miketress: Geforce 9600 GT/ = 2,6 sec vs i7 CPU 920_3.8GHZ[/b]) = 8.152 sec

integrated / mobile GPUs:
by ugokind : GeForce 9600M GT(1225 MHz/32 units) = 19.5 sec vs CPU P7350_2.00GHz= 110 sec
by Sherry Haibara : GeForce 9400M =15,6 sec vs CPU P8700_2.53GHz =37 sec


readme is within the zipped download below
newest DL LINK is alwys the last link V025
(timeresults changed, look at changes for compare to old time results



#60
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,884 posts
  • Gender:Male
  • Location:Stuttgart / Germany
Thanks ! Will do that soon and upload V030.
EDIT: First runs show that float4 (in gpu source & main app) makes gpu part run significant slower(on my 9600). Insted 10 times faster than CPU with float4 it runs 2 times slower than CPU.

Also i have seen that AMD ( http://ati.amd.com/t...tro_opencl.html ) also uses float and not float4 in their own example.

I will now change the code back to float, but remove the loop so that the source looks like amd example ;





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

© 2014 InsanelyMac  |   News  |   Forum  |   Downloads  |   OSx86 Wiki  |   Mac Netbook  |   PHP hosting by CatN  |   Designed by Ed Gain  |   Logo by irfan  |   Privacy Policy