Jump to content

OpenCL fix for non-GF100/GF110 cards (aka CC/SM 2.1+)


  • Please log in to reply
137 replies to this topic

#1
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

Note: This still applies for 10.7.4 and 10.8! No longer needed for 10.9!


good news everyone ;)

After I bought a GTX 560 Ti, I noticed a few odd things about the OpenCL support of this card.
It's telling you that it's capable of all these things, but it actually isn't and will produce compile errors like "requires .target sm_12 or higher" even though it's a sm_21 capable card. So, I started digging and from the looks of it, Apples OpenCL compiler is only (directly) supporting cards up to sm_20 (Quadro 4000, GTX 480/470/580/570). If it's higher than this it will fallback to sm_10 or sm_11.

The solution: let's just pretend we have a 2.0 card :D

So, open up a hex editor of your liking and do this:
open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo)

on 10.7.x and <=10.8.2:
find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02
replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90

on 10.8.3+ (as mentioned here):
find: 8B 81 1C 0C 00 00 EB 06 8B 81 20 0C 00 00
replace by: B8 02 00 00 00 90 EB 06 B8 00 00 00 00 90

save
reboot is not required, but recommended

What this basically does is replacing the dynamic cc device info in clhDeviceComputeCapability with a hardcoded 2.0 "info". Note that this is x64 only for the moment (which most people are certainly using since 10.7). I will add x86 support at a later point.
Also, if you have another non-sm2.0 capable nvidia card installed, this will (probably) break OpenCL support for it.

Now, everything that did work before should still be working ...

[Device 0]
Name: GeForce GTX 560 Ti
Vendor: NVIDIA
Type: GPU
Device Version: OpenCL 1.1
Driver Version: CLH 1.0
Compute Units: 16
Work Group Size: 1024
Clock: 0 MHz
Global Memory: 1024 MB
Local Memory: 48 KB
Cache Size: 0 Bytes
Cache Line Size: 0 Bytes
Available: Yes
Double-Precision: No
Extensions (12):
cl_APPLE_ContextLoggingFunctions
cl_APPLE_SetMemObjectDestructor
cl_APPLE_clut
cl_APPLE_fp64_basic_ops
cl_APPLE_gl_sharing
cl_APPLE_query_kernel_names
cl_khr_byte_addressable_store
cl_khr_gl_event
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics

... but programs that are using some advanced OpenCL features (e.g. lexmark) should work now too:
Attached File  Screen_Shot_2011_08_24_at_1.44.15_AM.png   481.35KB   965 downloads
Screen Shot 2012-02-09 at 10.04.48 AM.png

Edited by cmf, 17 September 2013 - 10:34 AM.


#2
riprod

riprod

    InsanelyMac Protégé

  • Members
  • Pip
  • 12 posts

The solution: let's just pretend we have a 2.0 card :)

So, open up a hex editor of your liking and do this:
open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo)
find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02
replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90
save
reboot is not required, but recommended


I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice.

#3
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice.

I haven't checked this on 10.7.0 or 10.7.1 yet, but I just checked the 10.7.0 nvidia drivers on 10.7.2 and it does work too (there aren't any relevant ptx/nvidia changes in OpenCL.framework, so this shouldn't matter).

It's either an GTX 460 issue or another issue altogether. I would guess the latter, since the error message you get from luxmark w/o this fix is a different one ("- OpenCL ERROR: clBuildProgram(-11)").
Could you check if you have OpenCL support at all (click)?

#4
riprod

riprod

    InsanelyMac Protégé

  • Members
  • Pip
  • 12 posts
I ran oclinfo and it indicates there is 1 OpenCL device found this is what it says:

1 OpenCL device found!

[Device 0]
Name: Intel® Core™2 Quad CPU Q8400 @ 2.66GHz
Vendor: Intel
Type: CPU
Device Version: OpenCL 1.1
Driver Version: 1.1
Compute Units: 4
Work Group Size: 1024
Clock: 3000 MHz
Global Memory (Total): 8192 MB
Global Memory (Host): 8192 MB
Global Memory (PCIe): 0 MB
Local Memory: 32 KB
Cache Size: 0.0625 KB
Cache Line Size: 2097152 Bytes
Available: Yes
Double-Precision: Yes
Extensions:
cl_APPLE_SetMemObjectDestructor
cl_APPLE_ContextLoggingFunctions
cl_APPLE_clut
cl_APPLE_query_kernel_names
cl_APPLE_gl_sharing
cl_khr_gl_event
cl_khr_fp64
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics
cl_khr_byte_addressable_store
cl_khr_int64_base_atomics
cl_khr_int64_extended_atomics
cl_khr_3d_image_writes
cl_APPLE_fp64_basic_ops
cl_APPLE_fixed_alpha_channel_orders
cl_APPLE_biased_fixed_point_image_formats

Any ideas?

#5
montiniz

montiniz

    InsanelyMac Protégé

  • Members
  • Pip
  • 12 posts
  • Gender:Male
  • Location:Chicago
works for me ;)

#6
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

I ran oclinfo and it indicates there is 1 OpenCL device found this is what it says:

Any ideas?

which os x version? have you really applied the initial opencl fix?
http://netkas.org/?p=794 (for 10.7.0 and 10.7.1) or http://netkas.org/?p...#comment-173693 (for 10.7.2)
if this didn't work, try this on the console: echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile
this will at least make it work partially (but not lexmark which seems to do some other weird stuff ...).

#7
riprod

riprod

    InsanelyMac Protégé

  • Members
  • Pip
  • 12 posts

which os x version? have you really applied the initial opencl fix?


I applied the opencl fix from netkas, but I think it was when I was on 10.7.0. Now on OS X version is 10.7.1. I will try and reapply the netkas opencl fix and see if that changes anything. I'll post back with any differences.

#8
riprod

riprod

    InsanelyMac Protégé

  • Members
  • Pip
  • 12 posts
I took a look at my /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/GeForceGLDriver. I found that the netkas edits I had done were not saved or had been over written in the update from 10.7.0 to 10.7.1. I reapplied the netkas hex edits and tried luxmark v1.0 and it worked!

Thank you.

Attached File  Screen_Shot_2011_08_25_at_1.22.40_PM.png   543.31KB   217 downloads

#9
kostya82

kostya82

    InsanelyMac Protégé

  • Members
  • Pip
  • 1 posts
  • Gender:Male
cmf

big thanks for advice how enable full OpenCL
work on 560ti

#10
sbl03

sbl03

    InsanelyMac Protégé

  • Members
  • Pip
  • 35 posts

I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice.


Grr, I get the same error with the same card after doing BOTH netkas and this, and restarting. I checked that the changes were still there and they were, after the restart. What's wrong :(

NVM! Somehow, my edits were wrong. Works now. Thanks.

#11
Florian U.

Florian U.

    InsanelyMac Protégé

  • Members
  • Pip
  • 8 posts
Hey there

I have a little issue here. The error is the exact same one, although I am using a GT440 card, which is a little different.
Bigger issue: I also have a GTX285 installed. (mac pro)

When changing that value to a fixed value, both cards get changed, and the gtx285 is a bit older and supports only sm1.3 i think. not sure about gt440, seems like it only supports sm1.0 ?!

any chance I can use the patch to get them both to be recognized with sm1.3 or sm1.0 that they will both work?

I tried final cut pro x with a few rendering tests, seems like gt440 + gtx285 is slower than just the gtx285 o.O

any advice is appreciated.

THANK YOU
florian

#12
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts
sm 1.3:
31 C0 FF C0 89 06 FF C0 FF C0 89 02 90 90 90 90

sm 1.2:
31 C0 FF C0 89 06 FF C0 89 02 90 90 90 90 90 90

sm 1.1:
31 C0 FF C0 89 06 89 02 90 90 90 90 90 90 90 90

untested, but it should work ;)

#13
Florian U.

Florian U.

    InsanelyMac Protégé

  • Members
  • Pip
  • 8 posts

sm 1.3:
31 C0 FF C0 89 06 FF C0 FF C0 89 02 90 90 90 90


Confirmed, is working. Thank you so much.

Got my GT440 and GTX285 now working in my MacPro4,1 (W3520) on Lion with ATY_init, openCL patch and CMF's patch. Still need to check if Final Cut Pro X will now render faster than with single GTX285.

#14
rominator

rominator

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 141 posts
I can confirm this works.

Tried on GTX460, full OpenCl on Mac Pro 4,1.

#15
mitch_de

mitch_de

    InsanelyMacaholic

  • Local Moderators
  • 2,884 posts
  • Gender:Male
  • Location:Stuttgart / Germany
If working, you are welcome to test your patch in real world by OceanWave OpenCL Benchmark.
Until now, no Fermi GPU runs that benchmark (from Apple) with success.

http://www.insanelym...howtopic=268209

PS: Even if OCLINFO runs well (after patching), it may happen not all OpenCL apps work too. You may get runtime or Compilererrors (OpenCL compiles on the fly).

#16
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

If working, you are welcome to test your patch in real world by OceanWave OpenCL Benchmark.
Until now, no Fermi GPU runs that benchmark (from Apple) with success.

http://www.insanelym...howtopic=268209

PS: Even if OCLINFO runs well (after patching), it may happen not all OpenCL apps work too. You may get runtime or Compilererrors (OpenCL compiles on the fly).

your binary is kinda broken, so i compiled it myself and it does work:
Attached File  Screen_Shot_2011_09_21_at_1.15.09_PM.png   329.49KB   142 downloads

#17
rominator

rominator

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 141 posts

your binary is kinda broken, so i compiled it myself and it does work:
Attached File  Screen_Shot_2011_09_21_at_1.15.09_PM.png   329.49KB   142 downloads


Could someone post the fixed version or explain how to compile this?

Guess we know now why we have been getting 16 fps....

#18
Florian U.

Florian U.

    InsanelyMac Protégé

  • Members
  • Pip
  • 8 posts

Could someone post the fixed version or explain how to compile this?

Guess we know now why we have been getting 16 fps....


i dont think it is broken, you just need to use terminal and cd into that directory and launch the application from there -> else it'll report files missing

#19
SuperHack

SuperHack

    InsanelyMac Protégé

  • Members
  • Pip
  • 39 posts
Can someone upload the edited file please for the OpenCL fix?

#20
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

i dont think it is broken, you just need to use terminal and cd into that directory and launch the application from there -> else it'll report files missing

thats what i did and it just segfaulted.

Could someone post the fixed version or explain how to compile this?

Guess we know now why we have been getting 16 fps....

use 10.7 sdk, compile, add #include <OpenGL/OpenGL.h> and #include <OpenGL/gl.h> in the two files you get compile errors in, compile again, possibly run successfully. you'll still get lots of compile warnings in xcode and when compiling the opencl kernel though.
i think this already tells you enough about the quality of this sample ...

Can someone upload the edited file please for the OpenCL fix?

sry, no, not using 10.7.1 any more. but if apple continues to push out a beta every 5 - 7 days and i get annoyed enough, i'll probably write a program ;P





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

© 2014 InsanelyMac  |   News  |   Forum  |   Downloads  |   OSx86 Wiki  |   Mac Netbook  |   PHP hosting by CatN  |   Designed by Ed Gain  |   Logo by irfan  |   Privacy Policy