Jump to content

OpenCL Oceanwave & Bandwidth Bench - 07. March 2013

OpenCL AMD NVIDIA

  • Please log in to reply
343 replies to this topic

#241
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany
uuups, really low (never seen!) Bandwidth speeds of your GT 430 card. Test it again - dont use it beside any other app running , and dont move mouse as it runs.
Isnt a burner but should perform at least the CPU<>GPU MB/s many times faster. At least 3-4 times faster should even lowest end GPUs like GT GT 210 /220 the PCIe Slot + CPU + BUS transfer the data to/from gpu. Such PCIe Speed looks like AGP transferspeeds (old GPU slot type).
Perhaps some Interrupt problems ?

We should collect some similar gpu bandwidth results for that user.


background: low bandwidth speeds may NOT end in also low OpenGL/OpenCL speeds but will have an negative effect. On OpenGL for texture transfers, for OpenCL/CUDA data transfers.

#242
RobertX

RobertX

    InSanelyMac Maverick

  • Members
  • PipPipPipPipPipPipPip
  • 531 posts
  • Gender:Not Telling
Running on...

GeForce GT 430

Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 162.4

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 203.1

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 11875.8

[oclBandwidthTest] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!

system profiler says my x16 pci-e card (128bit)is running at x1 ...examining the card i've found two or more of the gold connector pins appear damaged(not full length like the others) windows8 also reports x1 lane width and it's still faster than my gt520(which is only 64bit) :worried_anim:

#243
RobertX

RobertX

    InSanelyMac Maverick

  • Members
  • PipPipPipPipPipPipPip
  • 531 posts
  • Gender:Not Telling
rolled back my drivers... new results

/Users/leslie/Downloads/oclBandwidthTest Starting...

Running on...

GeForce GT 430

Quick Mode

Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 10129.9

Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 31610.5

Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6642.8

[oclBandwidthTest] test results...
PASSED

> exiting in 3 seconds: 3...2...1...done!


EDIT: got it working... http://www.insanelym...-lane-width-x1/ :smoke:

#244
RobertX

RobertX

    InSanelyMac Maverick

  • Members
  • PipPipPipPipPipPipPip
  • 531 posts
  • Gender:Not Telling
...finally, a somewhat "Happy Hack"

Attached File  GT 430 working.png   366.18KB   14 downloads

:smoke:

#245
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany
updated Tool to V1.4. added bandwidth measuring at programm start
Bandwidthes:
VRAM SPEED/ cpu speed/gpu speed = device to device MB/s
PCIe Mode (Lanes x1,8,16) /CPU/Chipset/GPU speed = host > device & device > host MB/s

If someone gets much less than 1000 MB/sec (1 GB /sec) in host > device and/or device > host values , than something is wrong with PCIe Speed (only used 1 Lane insted 8 or 16 lanes). CPU speed and gpu speed doenst matter in this case of << 1000 MB/Sec!
Highest possible values here will be about 8000-9000 MB/sec. Bad values are much below 1000 MB/sec.

VRAM speed can be seen with the device to device MB/sec. If VRAM is clocked low or much more important is only designed in 64 or 128 Bit you will get worse MB/sec here. 256/384/512 BIT VRAM shows much faster MB/sec.
Highest possible value here will be around 90000 MB/sec. Bad (indicates slow VRAM 64/128 Bit) is below 15000 MB/sec.

Attached Files



#246
RobertX

RobertX

    InSanelyMac Maverick

  • Members
  • PipPipPipPipPipPipPip
  • 531 posts
  • Gender:Not Telling
...just passin' through
Attached File  new.png   385.15KB   7 downloads

#247
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany
update to V 1.5 - UI changes for bandwidth test results.

Attached Files



#248
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany
Wow got fast bandwidth results:
HACKINTOSH OS X 10.8.3 Intel® Core™ i5-3570K CPU @ 3.40GHz 3400 MHz
GPU ATI Radeon HD Pitcairn XT Prototype Compute Engine 1000 MHz 444.9 fps


Bandwidthes:
device>host: 12002.8 MB/s
host>device:10074.9 MB/s
device >device (VRAM): 83085.6MB/s

What kind of AMD 6/7xxx gpu?

Most users will be limited by PCi 2.0 with max. 8000 MB/sec.

#249
k3nny

k3nny

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 538 posts
  • Gender:Male
It is an XFX Raden HD 7870 DD with the hardware in my signature.
Would be interesting to see a comparison to another 7xxx card. eep357, where are you? :D

#250
eep357

eep357

    Triple Platinum

  • Supervisors
  • 2,527 posts
  • Gender:Male
  • Location:Dark Side of The Wall
  • Interests:things and stuff
Attached File  OpenCL OceanWave & bandwidth Benchmark V1.4.jpg   106.1KB   13 downloads

In other news, my Mac Developer account just expired 2 mins ago and I don't have the $ to renew right now :( Hopefully yesterdays beta was the last! :)

#251
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany
PS: The two PCIe transfer speeds doenst matter much for gaming / openGL in case of 2000 vs 4000 vs 10000. Only if very bad (like AGP performance <= 500 MB/s) it shows less FPS in gaming.
Some gpu magazine tested that by switching from x16 Lane (up to 8000 MB/s) down to x1 Lane (up to 500 MB/s) by PCIe Slot pin manipulations. x16 > x8 or X4 was only a few % FPS speed diff. But x1 (up to 500 MB/s) was 30 % less FPS.
PCIe speed has much more diff in usage of data hungry gpu compute tasks (CUDA or AMD STEAM or OpenCL) were much more + constant huge data transfers moved over the pcie bus.

The 3. value, GPU/ VRAM has en direct effect for gaming speed - beside gpu performance.

#252
eep357

eep357

    Triple Platinum

  • Supervisors
  • 2,527 posts
  • Gender:Male
  • Location:Dark Side of The Wall
  • Interests:things and stuff
mitch_de: Is 3rd value 100% GPU dependent to where same card should have same results in any system it's installed?

#253
Wayang-NT

Wayang-NT

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 132 posts
  • Gender:Male
12D74 ...

windowed
Attached File  12D74 - 2013-03-02 at 9.51.39 .jpg   144.86KB   4 downloads

FS
Attached File  12D74 - 2013-03-02 at 9.52.44 .jpg   144.95KB   3 downloads

#254
eep357

eep357

    Triple Platinum

  • Supervisors
  • 2,527 posts
  • Gender:Male
  • Location:Dark Side of The Wall
  • Interests:things and stuff
@k3nny- If 90000MB/s is max possible for 512bit VRAM, I don't think device >device (VRAM): 83085.6MB/s can be possible with 256bit GDDR5 memory on the 7870? Since your using Clover to boot, in config.plist be sure there are no values entered for CPU speed or Turbo as this can slow down the OS system clock and cause OS to think things are going faster than they really are.

#255
k3nny

k3nny

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 538 posts
  • Gender:Male
I left these settings for Clover to decide. I neither have CPU Speed, nor Turbo in my config file. I don't get the big difference either.

#256
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany

mitch_de: Is 3rd value 100% GPU dependent to where same card should have same results in any system it's installed?

Yes it should be 100% gpu dependend - only if the AGPM of the other pc / OS X system is setup different / works wrong and the GPU + VRAM clocks getting much different the results (all!) will be also much different even using same gpu.


Interesting that one user gets much more FPS runnig Oceanwave in fullscreen = much higher res than windowed with 500x500 res. My 9600 GT is much slower in the case of fullscreen 1400x900 vs 500x500 windowed.

#257
eep357

eep357

    Triple Platinum

  • Supervisors
  • 2,527 posts
  • Gender:Male
  • Location:Dark Side of The Wall
  • Interests:things and stuff
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 4647.1
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6425.0
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 143930.1
[oclBandwidthTest] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
Using command line version, much different results?

Also in logs of bench see lots of
<program source>:226:26: warning: double precision constant requires cl_khr_fp64, casting to single precision
And it's using openCL 1.1 driver version 1.0, giving it far less extensions to utilize, also showing no double precision support which is a feature of this card, but requires openCL 1.2
[Device 0]
Name: ATI Radeon HD Tahiti XT Prototype Compute Engine
Vendor: AMD
Type: GPU
Device Version: OpenCL 1.1
Driver Version: 1.0
Compute Units: 32
Work Group Size: 1024
Clock: 1050 MHz
Global Memory: 1536 MB
Local Memory: 32 KB
Cache Size: 0 KB
Cache Line Size: 0 Bytes
Available: Yes
Double-Precision: No
Extensions:
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
vs CPU
[Device 1]
Name: Intel(R) Core(TM) i7 CPU		 920 @ 2.67GHz
Vendor: Intel
Type: CPU
Device Version: OpenCL 1.2
Driver Version: 1.1
Compute Units: 8
Work Group Size: 1024
Clock: 3800 MHz
Global Memory (Total): 24576 MB
Global Memory (Host): 24576 MB
Global Memory (PCIe): 0 MB
Local Memory: 32 KB
Cache Size: 0.0625 KB
Cache Line Size: 8388608 Bytes
Available: Yes
Double-Precision: Yes
Extensions:
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_3d_image_writes
cl_APPLE_fp64_basic_ops
cl_APPLE_fixed_alpha_channel_orders
cl_APPLE_biased_fixed_point_image_formats
Since this info come from same oclinfo tool I won't post it's output, but Luxmark shows correctly as openCL 1.2, and of course benches very well with this GPU

#258
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany
OpenCL Info: Luxmark shows OpenCL 1.2 but it shows not the device version (which is 1.1) insted shows the the OpenCL Platform Version which is always 1.2 on OS X - doesnt matter which GPU you use. I have discussed thar already in the luxmark dev thread. They will fix that in an future version - using device version and not platfrom version.
Platform Version means which OpenCL version the platform (the software driver) max. can handle - independed from the gpu hw features.

OpenCL bandwidth console vs OceanWave bandwidthes GPU/VRAM result: Thanks you found a little cosmetic bug - the vram value was truncated left if > 99,9 GB/s.
So the first 1 was not shown and the result was way less :)
Dont worry about OpenCL compiler warnings - they doesnt matter - and the Code for OpenCLInfo and OpenCL OceanWave comes from Apple ;)

UPDATED to Version 1.5.1 (DL first post)
- fixed truncation of VRAM speed if >99999 MB/s
- reformated to show in GB/s vs MB/s = better readable than big MB/s values

Attached Files



#259
eep357

eep357

    Triple Platinum

  • Supervisors
  • 2,527 posts
  • Gender:Male
  • Location:Dark Side of The Wall
  • Interests:things and stuff
OK, not my best FPS ever, but the device to device bandwidth is awesome :) Guess 90GB/s is not max then. Still don't get why CPU would be higher device version than GPU though.
Attached File  OpenCL OceanWave & bandwidth Benchmark V1.5.1.jpg   113.06KB   13 downloads

#260
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,879 posts
  • Gender:Male
  • Location:Stuttgart / Germany
Yep, i googelded a bit about VRAM bandwidth - can be much higher than my guessed 90 GB/s. Maybe modernst gpus like Titan or AMD 78/79xx have some enhanced Caches (similar to L1/L2 CPU caches) which makes VRAM access at least for small mem tranfers faster. But OPenCL bandwidth transfers huge (3 MB parts many times) so gpu caches will not help much.
Great to see that the truncated 1 of 140 is gone :)
PS: CPUs with gpu intern like Intel 4000 without an discrete gpu may have interesting results in bandwidth. Some of the values may be much faster then using normal PCIe x8 / x16 lanes. I dont know how and how fast the gpu part is connected to the CPU / BUS. But OpenCL speed (here FPS) will be slow of course.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users

© 2014 InsanelyMac  |   News  |   Forum  |   Downloads  |   OSx86 Wiki  |   Mac Netbook  |   PHP hosting by CatN  |   Designed by Ed Gain  |   Logo by irfan  |   Privacy Policy