uuups, really low (never seen!) Bandwidth speeds of your GT 430 card. Test it again - dont use it beside any other app running , and dont move mouse as it runs.
Isnt a burner but should perform at least the CPU<>GPU MB/s many times faster. At least 3-4 times faster should even lowest end GPUs like GT GT 210 /220 the PCIe Slot + CPU + BUS transfer the data to/from gpu. Such PCIe Speed looks like AGP transferspeeds (old GPU slot type).
Perhaps some Interrupt problems ?
We should collect some similar gpu bandwidth results for that user.
background: low bandwidth speeds may NOT end in also low OpenGL/OpenCL speeds but will have an negative effect. On OpenGL for texture transfers, for OpenCL/CUDA data transfers.
OpenCL Oceanwave & Bandwidth Bench - 07. March 2013
Started by mitch_de, Sep 18 2011 09:59 AM
OpenCL AMD NVIDIA
310 replies to this topic
#241
Posted 05 February 2013 - 01:25 PM
#242
Posted 05 February 2013 - 10:56 PM
Running on...
GeForce GT 430
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 162.4
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 203.1
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 11875.8
[oclBandwidthTest] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
system profiler says my x16 pci-e card (128bit)is running at x1 ...examining the card i've found two or more of the gold connector pins appear damaged(not full length like the others) windows8 also reports x1 lane width and it's still faster than my gt520(which is only 64bit)
GeForce GT 430
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 162.4
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 203.1
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 11875.8
[oclBandwidthTest] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
system profiler says my x16 pci-e card (128bit)is running at x1 ...examining the card i've found two or more of the gold connector pins appear damaged(not full length like the others) windows8 also reports x1 lane width and it's still faster than my gt520(which is only 64bit)
#243
Posted 06 February 2013 - 02:06 AM
/Users/leslie/Downloads/oclBandwidthTest Starting...
Running on...
GeForce GT 430
Quick Mode
Host to Device Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 10129.9
Device to Host Bandwidth, 1 Device(s), Paged memory, direct access
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 31610.5
Device to Device Bandwidth, 1 Device(s)
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 6642.8
[oclBandwidthTest] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
EDIT: got it working... http://www.insanelym...-lane-width-x1/
#244
Posted 11 February 2013 - 10:20 PM
#245
Posted 25 February 2013 - 10:29 AM
updated Tool to V1.4. added bandwidth measuring at programm start
Bandwidthes:
VRAM SPEED/ cpu speed/gpu speed = device to device MB/s
PCIe Mode (Lanes x1,8,16) /CPU/Chipset/GPU speed = host > device & device > host MB/s
If someone gets much less than 1000 MB/sec (1 GB /sec) in host > device and/or device > host values , than something is wrong with PCIe Speed (only used 1 Lane insted 8 or 16 lanes). CPU speed and gpu speed doenst matter in this case of << 1000 MB/Sec!
Highest possible values here will be about 8000-9000 MB/sec. Bad values are much below 1000 MB/sec.
VRAM speed can be seen with the device to device MB/sec. If VRAM is clocked low or much more important is only designed in 64 or 128 Bit you will get worse MB/sec here. 256/384/512 BIT VRAM shows much faster MB/sec.
Highest possible value here will be around 90000 MB/sec. Bad (indicates slow VRAM 64/128 Bit) is below 15000 MB/sec.
Bandwidthes:
VRAM SPEED/ cpu speed/gpu speed = device to device MB/s
PCIe Mode (Lanes x1,8,16) /CPU/Chipset/GPU speed = host > device & device > host MB/s
If someone gets much less than 1000 MB/sec (1 GB /sec) in host > device and/or device > host values , than something is wrong with PCIe Speed (only used 1 Lane insted 8 or 16 lanes). CPU speed and gpu speed doenst matter in this case of << 1000 MB/Sec!
Highest possible values here will be about 8000-9000 MB/sec. Bad values are much below 1000 MB/sec.
VRAM speed can be seen with the device to device MB/sec. If VRAM is clocked low or much more important is only designed in 64 or 128 Bit you will get worse MB/sec here. 256/384/512 BIT VRAM shows much faster MB/sec.
Highest possible value here will be around 90000 MB/sec. Bad (indicates slow VRAM 64/128 Bit) is below 15000 MB/sec.
Attached Files
#247
Posted 01 March 2013 - 02:28 PM
update to V 1.5 - UI changes for bandwidth test results.
Attached Files
#248
Posted 01 March 2013 - 03:01 PM
Wow got fast bandwidth results:
HACKINTOSH OS X 10.8.3 Intel® Core™ i5-3570K CPU @ 3.40GHz 3400 MHz
GPU ATI Radeon HD Pitcairn XT Prototype Compute Engine 1000 MHz 444.9 fps
Bandwidthes:
device>host: 12002.8 MB/s
host>device:10074.9 MB/s
device >device (VRAM): 83085.6MB/s
What kind of AMD 6/7xxx gpu?
Most users will be limited by PCi 2.0 with max. 8000 MB/sec.
HACKINTOSH OS X 10.8.3 Intel® Core™ i5-3570K CPU @ 3.40GHz 3400 MHz
GPU ATI Radeon HD Pitcairn XT Prototype Compute Engine 1000 MHz 444.9 fps
Bandwidthes:
device>host: 12002.8 MB/s
host>device:10074.9 MB/s
device >device (VRAM): 83085.6MB/s
What kind of AMD 6/7xxx gpu?
Most users will be limited by PCi 2.0 with max. 8000 MB/sec.
#249
Posted 01 March 2013 - 03:07 PM
It is an XFX Raden HD 7870 DD with the hardware in my signature.
Would be interesting to see a comparison to another 7xxx card. eep357, where are you?
Would be interesting to see a comparison to another 7xxx card. eep357, where are you?
#250
Posted 01 March 2013 - 03:25 PM
OpenCL OceanWave & bandwidth Benchmark V1.4.jpg 106.1K
13 downloadsIn other news, my Mac Developer account just expired 2 mins ago and I don't have the $ to renew right now
#251
Posted 02 March 2013 - 12:02 AM
PS: The two PCIe transfer speeds doenst matter much for gaming / openGL in case of 2000 vs 4000 vs 10000. Only if very bad (like AGP performance <= 500 MB/s) it shows less FPS in gaming.
Some gpu magazine tested that by switching from x16 Lane (up to 8000 MB/s) down to x1 Lane (up to 500 MB/s) by PCIe Slot pin manipulations. x16 > x8 or X4 was only a few % FPS speed diff. But x1 (up to 500 MB/s) was 30 % less FPS.
PCIe speed has much more diff in usage of data hungry gpu compute tasks (CUDA or AMD STEAM or OpenCL) were much more + constant huge data transfers moved over the pcie bus.
The 3. value, GPU/ VRAM has en direct effect for gaming speed - beside gpu performance.
Some gpu magazine tested that by switching from x16 Lane (up to 8000 MB/s) down to x1 Lane (up to 500 MB/s) by PCIe Slot pin manipulations. x16 > x8 or X4 was only a few % FPS speed diff. But x1 (up to 500 MB/s) was 30 % less FPS.
PCIe speed has much more diff in usage of data hungry gpu compute tasks (CUDA or AMD STEAM or OpenCL) were much more + constant huge data transfers moved over the pcie bus.
The 3. value, GPU/ VRAM has en direct effect for gaming speed - beside gpu performance.
#252
Posted 02 March 2013 - 02:41 AM
mitch_de: Is 3rd value 100% GPU dependent to where same card should have same results in any system it's installed?
#253
Posted 02 March 2013 - 03:00 AM
12D74 ...
windowed
12D74 - 2013-03-02 at 9.51.39 .jpg 144.86K
3 downloads
FS
12D74 - 2013-03-02 at 9.52.44 .jpg 144.95K
3 downloads
windowed
12D74 - 2013-03-02 at 9.51.39 .jpg 144.86K
3 downloadsFS
12D74 - 2013-03-02 at 9.52.44 .jpg 144.95K
3 downloads
#254
Posted 02 March 2013 - 05:55 AM
@k3nny- If 90000MB/s is max possible for 512bit VRAM, I don't think device >device (VRAM): 83085.6MB/s can be possible with 256bit GDDR5 memory on the 7870? Since your using Clover to boot, in config.plist be sure there are no values entered for CPU speed or Turbo as this can slow down the OS system clock and cause OS to think things are going faster than they really are.
#255
Posted 02 March 2013 - 06:22 AM
I left these settings for Clover to decide. I neither have CPU Speed, nor Turbo in my config file. I don't get the big difference either.
#256
Posted 02 March 2013 - 07:30 AM
eep357, on 02 March 2013 - 02:41 AM, said:
mitch_de: Is 3rd value 100% GPU dependent to where same card should have same results in any system it's installed?
Interesting that one user gets much more FPS runnig Oceanwave in fullscreen = much higher res than windowed with 500x500 res. My 9600 GT is much slower in the case of fullscreen 1400x900 vs 500x500 windowed.
#257
Posted 02 March 2013 - 08:09 AM
Quick Mode Host to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 4647.1 Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 6425.0 Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 143930.1 [oclBandwidthTest] test results... PASSED > exiting in 3 seconds: 3...2...1...done!Using command line version, much different results?
Also in logs of bench see lots of
<program source>:226:26: warning: double precision constant requires cl_khr_fp64, casting to single precisionAnd it's using openCL 1.1 driver version 1.0, giving it far less extensions to utilize, also showing no double precision support which is a feature of this card, but requires openCL 1.2
[Device 0] Name: ATI Radeon HD Tahiti XT Prototype Compute Engine Vendor: AMD Type: GPU Device Version: OpenCL 1.1 Driver Version: 1.0 Compute Units: 32 Work Group Size: 1024 Clock: 1050 MHz Global Memory: 1536 MB Local Memory: 32 KB Cache Size: 0 KB Cache Line Size: 0 Bytes Available: Yes Double-Precision: No Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_storevs CPU
[Device 1] Name: Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz Vendor: Intel Type: CPU Device Version: OpenCL 1.2 Driver Version: 1.1 Compute Units: 8 Work Group Size: 1024 Clock: 3800 MHz Global Memory (Total): 24576 MB Global Memory (Host): 24576 MB Global Memory (PCIe): 0 MB Local Memory: 32 KB Cache Size: 0.0625 KB Cache Line Size: 8388608 Bytes Available: Yes Double-Precision: Yes Extensions: cl_APPLE_SetMemObjectDestructor cl_APPLE_ContextLoggingFunctions cl_APPLE_clut cl_APPLE_query_kernel_names cl_APPLE_gl_sharing cl_khr_gl_event cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_APPLE_fp64_basic_ops cl_APPLE_fixed_alpha_channel_orders cl_APPLE_biased_fixed_point_image_formatsSince this info come from same oclinfo tool I won't post it's output, but Luxmark shows correctly as openCL 1.2, and of course benches very well with this GPU
#258
Posted 02 March 2013 - 12:54 PM
OpenCL Info: Luxmark shows OpenCL 1.2 but it shows not the device version (which is 1.1) insted shows the the OpenCL Platform Version which is always 1.2 on OS X - doesnt matter which GPU you use. I have discussed thar already in the luxmark dev thread. They will fix that in an future version - using device version and not platfrom version.
Platform Version means which OpenCL version the platform (the software driver) max. can handle - independed from the gpu hw features.
OpenCL bandwidth console vs OceanWave bandwidthes GPU/VRAM result: Thanks you found a little cosmetic bug - the vram value was truncated left if > 99,9 GB/s.
So the first 1 was not shown and the result was way less
Dont worry about OpenCL compiler warnings - they doesnt matter - and the Code for OpenCLInfo and OpenCL OceanWave comes from Apple
UPDATED to Version 1.5.1 (DL first post)
- fixed truncation of VRAM speed if >99999 MB/s
- reformated to show in GB/s vs MB/s = better readable than big MB/s values
Platform Version means which OpenCL version the platform (the software driver) max. can handle - independed from the gpu hw features.
OpenCL bandwidth console vs OceanWave bandwidthes GPU/VRAM result: Thanks you found a little cosmetic bug - the vram value was truncated left if > 99,9 GB/s.
So the first 1 was not shown and the result was way less
Dont worry about OpenCL compiler warnings - they doesnt matter - and the Code for OpenCLInfo and OpenCL OceanWave comes from Apple
UPDATED to Version 1.5.1 (DL first post)
- fixed truncation of VRAM speed if >99999 MB/s
- reformated to show in GB/s vs MB/s = better readable than big MB/s values
Attached Files
#259
Posted 02 March 2013 - 01:19 PM
OK, not my best FPS ever, but the device to device bandwidth is awesome
Guess 90GB/s is not max then. Still don't get why CPU would be higher device version than GPU though.
OpenCL OceanWave & bandwidth Benchmark V1.5.1.jpg 113.06K
11 downloads
OpenCL OceanWave & bandwidth Benchmark V1.5.1.jpg 113.06K
11 downloads
#260
Posted 02 March 2013 - 03:06 PM
Yep, i googelded a bit about VRAM bandwidth - can be much higher than my guessed 90 GB/s. Maybe modernst gpus like Titan or AMD 78/79xx have some enhanced Caches (similar to L1/L2 CPU caches) which makes VRAM access at least for small mem tranfers faster. But OPenCL bandwidth transfers huge (3 MB parts many times) so gpu caches will not help much.
Great to see that the truncated 1 of 140 is gone
PS: CPUs with gpu intern like Intel 4000 without an discrete gpu may have interesting results in bandwidth. Some of the values may be much faster then using normal PCIe x8 / x16 lanes. I dont know how and how fast the gpu part is connected to the CPU / BUS. But OpenCL speed (here FPS) will be slow of course.
Great to see that the truncated 1 of 140 is gone
PS: CPUs with gpu intern like Intel 4000 without an discrete gpu may have interesting results in bandwidth. Some of the values may be much faster then using normal PCIe x8 / x16 lanes. I dont know how and how fast the gpu part is connected to the CPU / BUS. But OpenCL speed (here FPS) will be slow of course.



Sign In
Create Account









