Jump to content

OpenCL fix for non-GF100/GF110 cards (aka CC/SM 2.1+)


cmf
 Share

138 posts in this topic

Recommended Posts

Note: This still applies for 10.7.4 and 10.8! No longer needed for 10.9!


good news everyone ;)

After I bought a GTX 560 Ti, I noticed a few odd things about the OpenCL support of this card.
It's telling you that it's capable of all these things, but it actually isn't and will produce compile errors like "requires .target sm_12 or higher" even though it's a sm_21 capable card. So, I started digging and from the looks of it, Apples OpenCL compiler is only (directly) supporting cards up to sm_20 (Quadro 4000, GTX 480/470/580/570). If it's higher than this it will fallback to sm_10 or sm_11.

The solution: let's just pretend we have a 2.0 card :D

So, open up a hex editor of your liking and do this:
open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo)

on 10.7.x and <=10.8.2:
find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02
replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90

on 10.8.3+ (as mentioned here):
find: 8B 81 1C 0C 00 00 EB 06 8B 81 20 0C 00 00
replace by: B8 02 00 00 00 90 EB 06 B8 00 00 00 00 90

save
reboot is not required, but recommended

What this basically does is replacing the dynamic cc device info in clhDeviceComputeCapability with a hardcoded 2.0 "info". Note that this is x64 only for the moment (which most people are certainly using since 10.7). I will add x86 support at a later point.
Also, if you have another non-sm2.0 capable nvidia card installed, this will (probably) break OpenCL support for it.

Now, everything that did work before should still be working ...

[Device 0]
Name: GeForce GTX 560 Ti
Vendor: NVIDIA
Type: GPU
Device Version: OpenCL 1.1
Driver Version: CLH 1.0
Compute Units: 16
Work Group Size: 1024
Clock: 0 MHz
Global Memory: 1024 MB
Local Memory: 48 KB
Cache Size: 0 Bytes
Cache Line Size: 0 Bytes
Available: Yes
Double-Precision: No
Extensions (12):
cl_APPLE_ContextLoggingFunctions
cl_APPLE_SetMemObjectDestructor
cl_APPLE_clut
cl_APPLE_fp64_basic_ops
cl_APPLE_gl_sharing
cl_APPLE_query_kernel_names
cl_khr_byte_addressable_store
cl_khr_gl_event
cl_khr_global_int32_base_atomics
cl_khr_global_int32_extended_atomics
cl_khr_local_int32_base_atomics
cl_khr_local_int32_extended_atomics

... but programs that are using some advanced OpenCL features (e.g. lexmark) should work now too:
post-42821-1314144738_thumb.png
post-42821-0-70997800-1328778494_thumb.png Edited by cmf
  • Like 4
Link to comment
Share on other sites

The solution: let's just pretend we have a 2.0 card :)

 

So, open up a hex editor of your liking and do this:

open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo)

find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02

replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90

save

reboot is not required, but recommended

 

I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice.

Link to comment
Share on other sites

I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice.

I haven't checked this on 10.7.0 or 10.7.1 yet, but I just checked the 10.7.0 nvidia drivers on 10.7.2 and it does work too (there aren't any relevant ptx/nvidia changes in OpenCL.framework, so this shouldn't matter).

 

It's either an GTX 460 issue or another issue altogether. I would guess the latter, since the error message you get from luxmark w/o this fix is a different one ("- OpenCL ERROR: clBuildProgram(-11)").

Could you check if you have OpenCL support at all (click)?

Link to comment
Share on other sites

I ran oclinfo and it indicates there is 1 OpenCL device found this is what it says:

 

1 OpenCL device found!

 

[Device 0]

Name: Intel® Core2 Quad CPU Q8400 @ 2.66GHz

Vendor: Intel

Type: CPU

Device Version: OpenCL 1.1

Driver Version: 1.1

Compute Units: 4

Work Group Size: 1024

Clock: 3000 MHz

Global Memory (Total): 8192 MB

Global Memory (Host): 8192 MB

Global Memory (PCIe): 0 MB

Local Memory: 32 KB

Cache Size: 0.0625 KB

Cache Line Size: 2097152 Bytes

Available: Yes

Double-Precision: Yes

Extensions:

cl_APPLE_SetMemObjectDestructor

cl_APPLE_ContextLoggingFunctions

cl_APPLE_clut

cl_APPLE_query_kernel_names

cl_APPLE_gl_sharing

cl_khr_gl_event

cl_khr_fp64

cl_khr_global_int32_base_atomics

cl_khr_global_int32_extended_atomics

cl_khr_local_int32_base_atomics

cl_khr_local_int32_extended_atomics

cl_khr_byte_addressable_store

cl_khr_int64_base_atomics

cl_khr_int64_extended_atomics

cl_khr_3d_image_writes

cl_APPLE_fp64_basic_ops

cl_APPLE_fixed_alpha_channel_orders

cl_APPLE_biased_fixed_point_image_formats

 

Any ideas?

Link to comment
Share on other sites

I ran oclinfo and it indicates there is 1 OpenCL device found this is what it says:

 

Any ideas?

which os x version? have you really applied the initial opencl fix?

http://netkas.org/?p=794 (for 10.7.0 and 10.7.1) or http://netkas.org/?p=794#comment-173693 (for 10.7.2)

if this didn't work, try this on the console: echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile

this will at least make it work partially (but not lexmark which seems to do some other weird stuff ...).

Link to comment
Share on other sites

which os x version? have you really applied the initial opencl fix?

 

I applied the opencl fix from netkas, but I think it was when I was on 10.7.0. Now on OS X version is 10.7.1. I will try and reapply the netkas opencl fix and see if that changes anything. I'll post back with any differences.

Link to comment
Share on other sites

I took a look at my /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/GeForceGLDriver. I found that the netkas edits I had done were not saved or had been over written in the update from 10.7.0 to 10.7.1. I reapplied the netkas hex edits and tried luxmark v1.0 and it worked!

 

Thank you.

 

post-130852-1314296649_thumb.png

Link to comment
Share on other sites

I have a 460 GTX, I've done netkas opencl fix. I tried your method above and when running Luxmark I get this error in Luxmark: 2011-08-24 12:02:31 - RUNTIME ERROR: Unable to find any appropiate IntersectionDevice.

 

Grr, I get the same error with the same card after doing BOTH netkas and this, and restarting. I checked that the changes were still there and they were, after the restart. What's wrong :(

 

NVM! Somehow, my edits were wrong. Works now. Thanks.

Link to comment
Share on other sites

  • 2 weeks later...

Hey there

 

I have a little issue here. The error is the exact same one, although I am using a GT440 card, which is a little different.

Bigger issue: I also have a GTX285 installed. (mac pro)

 

When changing that value to a fixed value, both cards get changed, and the gtx285 is a bit older and supports only sm1.3 i think. not sure about gt440, seems like it only supports sm1.0 ?!

 

any chance I can use the patch to get them both to be recognized with sm1.3 or sm1.0 that they will both work?

 

I tried final cut pro x with a few rendering tests, seems like gt440 + gtx285 is slower than just the gtx285 o.O

 

any advice is appreciated.

 

THANK YOU

florian

Link to comment
Share on other sites

sm 1.3:

31 C0 FF C0 89 06 FF C0 FF C0 89 02 90 90 90 90

 

Confirmed, is working. Thank you so much.

 

Got my GT440 and GTX285 now working in my MacPro4,1 (W3520) on Lion with ATY_init, openCL patch and CMF's patch. Still need to check if Final Cut Pro X will now render faster than with single GTX285.

Link to comment
Share on other sites

If working, you are welcome to test your patch in real world by OceanWave OpenCL Benchmark.

Until now, no Fermi GPU runs that benchmark (from Apple) with success.

 

http://www.insanelymac.com/forum/index.php?showtopic=268209

 

PS: Even if OCLINFO runs well (after patching), it may happen not all OpenCL apps work too. You may get runtime or Compilererrors (OpenCL compiles on the fly).

Link to comment
Share on other sites

If working, you are welcome to test your patch in real world by OceanWave OpenCL Benchmark.

Until now, no Fermi GPU runs that benchmark (from Apple) with success.

 

http://www.insanelymac.com/forum/index.php?showtopic=268209

 

PS: Even if OCLINFO runs well (after patching), it may happen not all OpenCL apps work too. You may get runtime or Compilererrors (OpenCL compiles on the fly).

your binary is kinda broken, so i compiled it myself and it does work:

post-42821-1316603823_thumb.png

  • Like 1
Link to comment
Share on other sites

Could someone post the fixed version or explain how to compile this?

 

Guess we know now why we have been getting 16 fps....

 

i dont think it is broken, you just need to use terminal and cd into that directory and launch the application from there -> else it'll report files missing

Link to comment
Share on other sites

i dont think it is broken, you just need to use terminal and cd into that directory and launch the application from there -> else it'll report files missing

thats what i did and it just segfaulted.

Could someone post the fixed version or explain how to compile this?

 

Guess we know now why we have been getting 16 fps....

use 10.7 sdk, compile, add #include <OpenGL/OpenGL.h> and #include <OpenGL/gl.h> in the two files you get compile errors in, compile again, possibly run successfully. you'll still get lots of compile warnings in xcode and when compiling the opencl kernel though.

i think this already tells you enough about the quality of this sample ...

Can someone upload the edited file please for the OpenCL fix?

sry, no, not using 10.7.1 any more. but if apple continues to push out a beta every 5 - 7 days and i get annoyed enough, i'll probably write a program ;P

Link to comment
Share on other sites

good news everyone :(

 

After I bought a GTX 560 Ti, I noticed a few odd things about the OpenCL support of this card.

It's telling you that it's capable of all these things, but it actually isn't and will produce compile errors like "requires .target sm_12 or higher" even though it's a sm_21 capable card. So, I started digging and from the looks of it, Apples OpenCL compiler is only (directly) supporting cards up to sm_20 (Quadro 4000, GTX 480/470/580/570). If it's higher than this it will fallback to sm_10 or sm_11.

 

The solution: let's just pretend we have a 2.0 card :)

 

So, open up a hex editor of your liking and do this:

open /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib (as root or with sudo)

find: 8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02

replace by: 31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90

save

reboot is not required, but recommended

 

What this basically does is replacing the dynamic cc device info in clhDeviceComputeCapability with a hardcoded 2.0 "info". Note that this is x64 only for the moment (which most people are certainly using since 10.7). I will add x86 support at a later point.

Also, if you have another non-sm2.0 capable nvidia card installed, this will (probably) break OpenCL support for it.

 

Hey cmf,

 

I am about to build my new comp in the next few week after I buy all my parts. I am going to install following the asus P8P67 guide in the install forums, but I was wondering should I do this directly after install? Or should I, as I read earlier, install the netkas opencl that people were trying then do this?

 

Thank you for the find! I was about to switch my card of choice(gtx 560 ti) to the 6850 until I decided to take a quick look over at the hardware forums LOL :D

Link to comment
Share on other sites

should I do this directly after install? Or should I, as I read earlier, install the netkas opencl that people were trying then do this?

every single time the file is overwritten by an update, so yes, after the install and after each 10.7.x update.

and you need both opencl fixes on non-gf100/gf110 cards.

 

This is working for my GTX480 :o

huh? this isn't required for gtx 480.

Link to comment
Share on other sites

huh? this isn't required for gtx 480.

 

For my config it is, without this patch I don't have OpenCL working for my GTX 480. It might be related to the fact I have two GPUs on my HackinTosh: ATI HD6870 + NVidia GTX 480

 

pyrit benchmark
Pyrit 0.4.1-dev (svn r308) © 2008-2011 Lukas Lueg [url="http://pyrit.googlecode.com"]http://pyrit.googlecode.com[/url]
This code is distributed under the GNU General Public License v3+

Running benchmark (61279.2 PMKs/s)... - 

Computed 61279.17 PMKs/s total.
#1: 'CUDA-Device #1 'GeForce GTX 480'': 25619.8 PMKs/s (RTT 2.9)
#2: 'OpenCL-Device 'ATI Radeon Barts XT Prototype'': 31135.4 PMKs/s (RTT 2.8)
#3: 'OpenCL-Device 'GeForce GTX 480'': 6744.2 PMKs/s (RTT 3.2)
#4: 'CPU-Core (SSE2)': 643.6 PMKs/s (RTT 3.0)
#5: 'CPU-Core (SSE2)': 625.2 PMKs/s (RTT 3.1)
#6: 'CPU-Core (SSE2)': 634.1 PMKs/s (RTT 3.0)
#7: 'CPU-Core (SSE2)': 619.5 PMKs/s (RTT 3.0)
#8: 'CPU-Core (SSE2)': 654.0 PMKs/s (RTT 3.0)

 

\o/

 

Without your patch, only CUDA is available for my GTX 480 ;)

 

I have combo upgraded to 11C73 today, let's see if your hack still works :

 

EDIT:

 

Before Hack, no more OpenCL for my GTX 480 :

 

imac-de-thireus:Desktop thireus$ pyrit list_cores
Pyrit 0.4.1-dev (svn r308) © 2008-2011 Lukas Lueg [url="http://pyrit.googlecode.com"]http://pyrit.googlecode.com[/url]
This code is distributed under the GNU General Public License v3+

The following cores seem available...
#1:  'CUDA-Device #1 'GeForce GTX 480''
#2:  'OpenCL-Device 'ATI Radeon Barts XT Prototype''
#3:  'CPU-Core (SSE2)'
#4:  'CPU-Core (SSE2)'
#5:  'CPU-Core (SSE2)'
#6:  'CPU-Core (SSE2)'
#7:  'CPU-Core (SSE2)'
#8:  'CPU-Core (SSE2)'

 

[OpenCL-only Context]
2 OpenCL devices found!

[Device 0]
Name: 			Intel® Core(tm) i7-2600K CPU @ 3.40GHz
Vendor: 		Intel
Type: 			CPU 
Device Version: 	OpenCL 1.1 
Driver Version: 	1.1
Compute Units: 		8
Work Group Size: 	1024
Clock: 			3411 MHz
Global Memory (Total): 	8192 MB
Global Memory (Host): 	8192 MB
Global Memory (PCIe): 	0 MB
Local Memory: 		32 KB
Cache Size: 		0.0625 KB
Cache Line Size: 	8388608 Bytes
Available: 		Yes
Double-Precision: 	Yes
Extensions: 
			cl_APPLE_SetMemObjectDestructor
			cl_APPLE_ContextLoggingFunctions
			cl_APPLE_clut
			cl_APPLE_query_kernel_names
			cl_APPLE_gl_sharing
			cl_khr_gl_event
			cl_khr_fp64
			cl_khr_global_int32_base_atomics
			cl_khr_global_int32_extended_atomics
			cl_khr_local_int32_base_atomics
			cl_khr_local_int32_extended_atomics
			cl_khr_byte_addressable_store
			cl_khr_int64_base_atomics
			cl_khr_int64_extended_atomics
			cl_khr_3d_image_writes
			cl_APPLE_fp64_basic_ops
			cl_APPLE_fixed_alpha_channel_orders
			cl_APPLE_biased_fixed_point_image_formats

[Device 1]
Name: 			ATI Radeon Barts XT Prototype
Vendor: 		AMD
Type: 			GPU 
Device Version: 	OpenCL 1.1 
Driver Version: 	1.0
Compute Units: 		14
Work Group Size: 	1024
Clock: 			970 MHz
Global Memory: 		512 MB
Local Memory: 		32 KB
Cache Size: 		0 KB
Cache Line Size: 	0 Bytes
Available: 		Yes
Double-Precision: 	No
Extensions: 
			cl_APPLE_SetMemObjectDestructor
			cl_APPLE_ContextLoggingFunctions
			cl_APPLE_clut
			cl_APPLE_query_kernel_names
			cl_APPLE_gl_sharing
			cl_khr_gl_event
			cl_khr_global_int32_base_atomics
			cl_khr_global_int32_extended_atomics
			cl_khr_local_int32_base_atomics
			cl_khr_local_int32_extended_atomics
			cl_khr_byte_addressable_store
			cl_khr_3d_image_writes

 

Let's hack this stuff...

 

EDIT :

 

Back after patching :)

 

imac-de-thireus:~ thireus$ pyrit list_cores
Pyrit 0.4.1-dev (svn r308) © 2008-2011 Lukas Lueg [url="http://pyrit.googlecode.com"]http://pyrit.googlecode.com[/url]
This code is distributed under the GNU General Public License v3+

The following cores seem available...
#1:  'CUDA-Device #1 'GeForce GTX 480''
#2:  'OpenCL-Device 'ATI Radeon Barts XT Prototype''
#3:  'OpenCL-Device 'GeForce GTX 480''
#4:  'CPU-Core (SSE2)'
#5:  'CPU-Core (SSE2)'
#6:  'CPU-Core (SSE2)'
#7:  'CPU-Core (SSE2)'
#8:  'CPU-Core (SSE2)'

 

[OpenCL-only Context]
3 OpenCL devices found!

[Device 0]
Name: 			Intel® Core(tm) i7-2600K CPU @ 3.40GHz
Vendor: 		Intel
Type: 			CPU 
Device Version: 	OpenCL 1.1 
Driver Version: 	1.1
Compute Units: 		8
Work Group Size: 	1024
Clock: 			3411 MHz
Global Memory (Total): 	8192 MB
Global Memory (Host): 	8192 MB
Global Memory (PCIe): 	0 MB
Local Memory: 		32 KB
Cache Size: 		0.0625 KB
Cache Line Size: 	8388608 Bytes
Available: 		Yes
Double-Precision: 	Yes
Extensions: 
			cl_APPLE_SetMemObjectDestructor
			cl_APPLE_ContextLoggingFunctions
			cl_APPLE_clut
			cl_APPLE_query_kernel_names
			cl_APPLE_gl_sharing
			cl_khr_gl_event
			cl_khr_fp64
			cl_khr_global_int32_base_atomics
			cl_khr_global_int32_extended_atomics
			cl_khr_local_int32_base_atomics
			cl_khr_local_int32_extended_atomics
			cl_khr_byte_addressable_store
			cl_khr_int64_base_atomics
			cl_khr_int64_extended_atomics
			cl_khr_3d_image_writes
			cl_APPLE_fp64_basic_ops
			cl_APPLE_fixed_alpha_channel_orders
			cl_APPLE_biased_fixed_point_image_formats

[Device 1]
Name: 			ATI Radeon Barts XT Prototype
Vendor: 		AMD
Type: 			GPU 
Device Version: 	OpenCL 1.1 
Driver Version: 	1.0
Compute Units: 		14
Work Group Size: 	1024
Clock: 			970 MHz
Global Memory: 		512 MB
Local Memory: 		32 KB
Cache Size: 		0 KB
Cache Line Size: 	0 Bytes
Available: 		Yes
Double-Precision: 	No
Extensions: 
			cl_APPLE_SetMemObjectDestructor
			cl_APPLE_ContextLoggingFunctions
			cl_APPLE_clut
			cl_APPLE_query_kernel_names
			cl_APPLE_gl_sharing
			cl_khr_gl_event
			cl_khr_global_int32_base_atomics
			cl_khr_global_int32_extended_atomics
			cl_khr_local_int32_base_atomics
			cl_khr_local_int32_extended_atomics
			cl_khr_byte_addressable_store
			cl_khr_3d_image_writes

[Device 2]
Name: 			GeForce GTX 480
Vendor: 		NVIDIA
Type: 			GPU 
Device Version: 	OpenCL 1.0 
Driver Version: 	CLH 1.0
Compute Units: 		60
Work Group Size: 	1024
Clock: 			0 MHz
Global Memory: 		1536 MB
Local Memory: 		48 KB
Cache Size: 		0 KB
Cache Line Size: 	0 Bytes
Available: 		Yes
Double-Precision: 	No
Extensions: 
			cl_APPLE_SetMemObjectDestructor
			cl_APPLE_ContextLoggingFunctions
			cl_APPLE_clut
			cl_APPLE_query_kernel_names
			cl_APPLE_gl_sharing
			cl_khr_gl_event
			cl_khr_byte_addressable_store
			cl_khr_global_int32_base_atomics
			cl_khr_global_int32_extended_atomics
			cl_khr_local_int32_base_atomics
			cl_khr_local_int32_extended_atomics
			cl_APPLE_fp64_basic_ops

 

So do you have an explanation why I need your patch?

Also, can you tell me what's the latest version of OpenCL that should be detected for both GPUs ?

I don't understand what "sm1.3" stands for... SM = ? And I don't understand why sm2.0 patch doesn't work for my GTX 480 ;)

 

Little video about Galaxies benchmark : http://thireus.dareyourmind.net/OpenCL_GAL...4_VSYNC_OFF.zip

Link to comment
Share on other sites

For my config it is, without this patch I don't have OpenCL working for my GTX 480. It might be related to the fact I have two GPUs on my HackinTosh: ATI HD6870 + NVidia GTX 480

 

So do you have an explanation why I need your patch?

Also, can you tell me what's the latest version of OpenCL that should be detected for both GPUs ?

I don't understand what "sm1.3" stands for... SM = ? And I don't understand why sm2.0 patch doesn't work for my GTX 480 :)

k, this is weird and interesting. but yes, it is probably because you have an ati card installed as your primary card.

two things you could try:

1) swap the cards, so the nvidia card is your primary card (and then try again with and without the sm 2.0 fix)

2) as i mentioned in an earlier post, type this in terminal: echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile

 

concerning sm/cc: http://developer.nvidia.com/cuda-gpus aka "what your gpu is capable of" (e.g. double precision fp, local memory atomics, unified addressing)

sm/cc 1.x will give opencl device version 1.0, 2.x will give you opencl 1.1.

Link to comment
Share on other sites

 Share

×
×
  • Create New...