Jump to content

OpenCL fix for non-GF100/GF110 cards (aka CC/SM 2.1+)


  • Please log in to reply
137 replies to this topic

#41
ugokind

ugokind

    InsanelyMac Deity

  • Donators
  • 1,712 posts
  • Gender:Male
  • Location:10100
  • Interests:Apicoltura
    Mac
    Linux
    Homebrew
    Australia
    Spremermilcervello
always error in luxmarks.. other opencl bench run!
why?

2012-02-01 22:19:40 - [RenderConfig] scene.file = scenes/sala/sala.scn
2012-02-01 22:19:42 - [LuxRays] OpenCL Platform 0: Apple
2012-02-01 22:19:42 - [LuxRays] Device 0 NativeThread name: NativeThread-000
2012-02-01 22:19:42 - [LuxRays] Device 1 NativeThread name: NativeThread-001
2012-02-01 22:19:42 - [LuxRays] Device 2 NativeThread name: NativeThread-002
2012-02-01 22:19:42 - [LuxRays] Device 3 NativeThread name: NativeThread-003
2012-02-01 22:19:42 - [LuxRays] Device 4 NativeThread name: NativeThread-004
2012-02-01 22:19:42 - [LuxRays] Device 5 NativeThread name: NativeThread-005
2012-02-01 22:19:42 - [LuxRays] Device 6 NativeThread name: NativeThread-006
2012-02-01 22:19:42 - [LuxRays] Device 7 NativeThread name: NativeThread-007
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL name: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL type: CPU
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL units: 8
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL max allocable memory: 6144MBytes
2012-02-01 22:19:42 - RUNTIME ERROR: No OpenCL device selected or available


#42
oSxFr33k

oSxFr33k

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 811 posts
  • Gender:Male
  • Interests:Sound and Graphic Design. Electronics in general.
Are the Hex edits different after updating to 10.7.3 or are the the same location and same string within the bundled files?

#43
tamorgen

tamorgen

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 130 posts
  • Gender:Male
  • Location:Maryland

always error in luxmarks.. other opencl bench run!
why?

2012-02-01 22:19:40 - [RenderConfig] scene.file = scenes/sala/sala.scn
2012-02-01 22:19:42 - [LuxRays] OpenCL Platform 0: Apple
2012-02-01 22:19:42 - [LuxRays] Device 0 NativeThread name: NativeThread-000
2012-02-01 22:19:42 - [LuxRays] Device 1 NativeThread name: NativeThread-001
2012-02-01 22:19:42 - [LuxRays] Device 2 NativeThread name: NativeThread-002
2012-02-01 22:19:42 - [LuxRays] Device 3 NativeThread name: NativeThread-003
2012-02-01 22:19:42 - [LuxRays] Device 4 NativeThread name: NativeThread-004
2012-02-01 22:19:42 - [LuxRays] Device 5 NativeThread name: NativeThread-005
2012-02-01 22:19:42 - [LuxRays] Device 6 NativeThread name: NativeThread-006
2012-02-01 22:19:42 - [LuxRays] Device 7 NativeThread name: NativeThread-007
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL name: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL type: CPU
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL units: 8
2012-02-01 22:19:42 - [LuxRays] Device 8 OpenCL max allocable memory: 6144MBytes
2012-02-01 22:19:42 - RUNTIME ERROR: No OpenCL device selected or available


I'm now getting these same errors after updating to 10.7.3, wheras with 10.7.2, no errors.

Are the Hex edits different after updating to 10.7.3 or are the the same location and same string within the bundled files?


I'm thinking something has changed in 10.7.3. My Cinebench score dropped from 45.22 to 34.96. I had used a script to perform this update, which worked on 10.7.2, but does not on 10.7.3. Many other users are reporting a 25% drop in their benchmark scores. Could OpenCL 2.0 support be the culprit?

#44
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts
echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile
seems to be absolutely necessary now. it won't get recognized otherwise, even with binary patches.

Are the Hex edits different after updating to 10.7.3 or are the the same location and same string within the bundled files?

not sure about the locations, but the hex strings are still the same as in 10.7.2.


looking into luxmark right now, will report back when i figured out whose fault it is ;)

#45
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts
sry for double post, but it does indeed work :)

dunno what i did wrong the first time, but os x somehow overwrote my changes (bad beta 10.7.3 -> 10.7.3 upgrade i guess ...).

Attached Files



#46
Sysyphus

Sysyphus

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 137 posts
  • Gender:Male
  • Location:3rd rock from that big yellow shiny thing in the sky!

sm 1.3:


31 C0 FF C0 89 06 FF C0 FF C0 89 02 90 90 90 90




Just managed to get OpenCL working on a MSI N550GTX-Ti Cyclone II (http://www.newegg.com/Product/Product.aspx?Item=N82E16814127573)



#47
Carstiman

Carstiman

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 116 posts
  • Gender:Male
hello, GTS450 working

GeForceGLDriver.bundle/Contents/MacOS/GeForceGLDriver:
78e883f8 02 7c11
replaced by:
78e883f8 03 7c11

GeForceGLDriver.bundle/Contents/MacOS/libclh.dylib:
8B 87 1C 0C 00 00 89 06 8B 87 20 0C 00 00 89 02
replaced by:
31 C0 FF C0 FF C0 89 06 31 C0 89 02 90 90 90 90

Insert the device ID in "NVDAGF100Hal.kext" info.plist

<key>IOPCIPrimaryMatch</key>
<string>
0x06c010de&0xffe0ffff
0x0dc010de&0xffc0ffff
0x0e2010de&0xffe0ffff
0x0ee010de&0xffe0ffff
0x0f0010de&0xffc0ffff
0x104010de&0xffc0ffff
0x124010de&0xffc0ffff
0x0dc410de&0xffc0ffff
</string>

but Boinc http://boinc.berkeley.edu/ won´t reconice the openCL like it does before in 10.7.2

Attached Files



#48
tamorgen

tamorgen

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 130 posts
  • Gender:Male
  • Location:Maryland
I'm not sure if this has anything to do with the Open Cl 2.0 fix or not, but I just verified that I am not getting QE Ci with 10.7.3. I tried adding a Widget to the dashboard, and I no longer get the ripple effect as I previously did.

#49
oSxFr33k

oSxFr33k

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 811 posts
  • Gender:Male
  • Interests:Sound and Graphic Design. Electronics in general.
@cmf,

Same hex code in 10.7.2 as in 10.7.3 for for the libclh.dylib, but there is a change in the GeForceGLDriver see below
From Netkas Thread:


Find
EB A8 83 F8 02 7C 15
replace 02 with 03 to get
EB A8 83 F8 03 7C 15
Find
78 E8 83 F8 02 7C 11
replace 02 with 03 to get
78 E8 83 F8 03 7C 11

I cannot find the second set "78 E8 83 F8 02 7C 11"

@cmf

What exactly is this command for or what will it do?


echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile



Thanks


EDITED A FEW MINUTES LATER:

My Fault I was using Crossover with the windows Editor HxD and for some reason it did not find the second set hex string. Ran a Mac Hex editor and it found it.

Sorry!!

I still have the question about the export CL_ENABLE_SM2_DEVICE=1

#50
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

What exactly is this command for or what will it do?

echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile

it's basically the same as the GeForceGLDriver binary patch, but at the user level (and w/o modifying the binary, obviously).
i.e. if the binary patch isn't applied, opencl will only function if "CL_ENABLE_SM2_DEVICE" is defined/set in the users environment vars (default set can be changed through the .profile file).
so, if apple ever decides to add an additional "is device opencl capable?" check or shuffle the code around, so that the binary patch doesn't work any more (which i'm surprised it did in 10.7.3), opencl should still work (to some extent*) using the "CL_ENABLE_SM2_DEVICE" define.

downside and (*): this really only works if a program is started by the user and the program doesn't ignore/overwrite the cl define or does some other weird stuff (like luxmark does ...). some system services/programs that use opencl and are started by another user also won't be able to use the opencl device (just open activity monitor and look which processes aren't owned by your user).


and ftr and the people that are interested:
the GeForceGLDriver binary patch changes the "cmp eax, 2" to a "cmp eax, 3" (note: eax at this point contains the major version of the nvidia compute capability (cc/sm), which is 2 for fermi gpus), so the subsequent jump condition will evaluate to true (instead of false, b/c 2 is not less than 2, but 2 < 3!) and continue @loc_8F014446. if the binary patch has not been applied (it still says "cmp eax, 2"), it will continue and check if "CL_ENABLE_SM2_DEVICE" is defined in the users env vars. if so, it will also continue @loc_8F014446. if not, the device will be "destroyed" and be "declared" not opencl capable.
Attached File  Screen Shot 2012-02-14 at 2.26.27 PM.png   307.49KB   67 downloads

#51
tamorgen

tamorgen

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 130 posts
  • Gender:Male
  • Location:Maryland
@cmf,
I did the edits to the GeForceGLDriver and

libclh.dylib. From you description, it sounds like I DO NOT need to do the

echo

"export CL_ENABLE_SM2_DEVICE=1"

>>

~/.

profile, correct?



Also, are you getting even remotely the same benchmarks as you did in 10.7.2?

I've done all the correct manual edits to AGPM.kext, NVDAGF100Hal, and GeForceGLDriver, but after the update to 10.7.3, my score dropped from 45.22 to 35.67. My Luxmark 1.0 Score dropped from 7115 to 4550. Also, with LuxMark 2.0, it does not show my clock speed (it actually shows 0 MHz), and I get a whopping 133 score for the Room Scene.

I know it's comparing apples and oranges (or Apples & Windows), but my GTX 570 HD gets a 55.70 Cinebench score in Win 7 x64. 45 wasn't great, but it was at least in the ballpark. 35 is barely above half of the Windows score.

#52
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

@cmf,
I did the edits to the GeForceGLDriver and

libclh.dylib. From you description, it sounds like I DO NOT need to do the

echo

"export CL_ENABLE_SM2_DEVICE=1"

>>

~/.

profile, correct?

yep, not necessary for now.

Also, are you getting even remotely the same benchmarks as you did in 10.7.2?

I've done all the correct manual edits to AGPM.kext, NVDAGF100Hal, and GeForceGLDriver, but after the update to 10.7.3, my score dropped from 45.22 to 35.67. My Luxmark 1.0 Score dropped from 7115 to 4550. Also, with LuxMark 2.0, it does not show my clock speed (it actually shows 0 MHz), and I get a whopping 133 score for the Room Scene.

I know it's comparing apples and oranges (or Apples & Windows), but my GTX 570 HD gets a 55.70 Cinebench score in Win 7 x64. 45 wasn't great, but it was at least in the ballpark. 35 is barely above half of the Windows score.

uhm, hard to say, at least for the opencl part. i'd say it's about the same and depending on what you're doing apples opencl implementation can be faster or slower than nvidias implementation.
for the opengl part, well, opengl performance on os x has always sucked and nvidias opengl drivers on windows/linux/... are far superior (including shader compiler optimizations at which apple should actually be good at ...).
and concerning benchmarks: they are not everything. things can easily go wrong on both sides. and i'd like to think that apple has better tests/indicators than some benchmark to tell if their implementations performance has dropped by 50% or whatever.

and correct clock speed (+cache sizes) display has sadly never worked for fermi gpus on 10.7.

#53
oSxFr33k

oSxFr33k

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 811 posts
  • Gender:Male
  • Interests:Sound and Graphic Design. Electronics in general.
@cmf,

Thanks very much for the explanation. SO basically the echo command sets a user configuration file which I ran anyhow even though I did all the necessary hex edits for 10.7.3. SHould I remove that lined of script from the the user profile since its not needed or use leave things alone?

Thanks for your great thread!!!!

#54
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

@cmf,

Thanks very much for the explanation. SO basically the echo command sets a user configuration file which I ran anyhow even though I did all the necessary hex edits for 10.7.3. SHould I remove that lined of script from the the user profile since its not needed or use leave things alone?

Thanks for your great thread!!!!

it doesn't hurt you if you keep it in and it might be beneficial (for a while) when 10.7.4 comes along ;)

#55
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts
small 10.8 update (with a disclaimer that i haven't tested anything of this yet):
apple did in fact shuffle some code around and the GeForceGLDriver code that had to be patched to get opencl working is now located in libclh.dylib. however, it seems like the patch is _not_ required any more (as does the other libclh.dylib patch, although it would still be valid), b/c either opencl is fully supported on fermi gpus now (most likely) or they moved the check someplace else that i haven't found yet.

some other interesting tidbits (for which i should probably start a new and highly speculative thread ;)):
* there are some references to opengl 4 and opengl es 2 support (since i couldn't test it on my hackintosh yet, i'm not sure if an opengl 4 context can be created ... i however failed to create an opengl 3.3 context on my oldish mbp, which could either mean opengl 3.3 won't be supported at all or apple is still working on it - hoping for the latter and full opengl 4.0+ support of course)
* there are some very interesting kepler related functions/strings in libclh and geforcegldriver (which could mean three things: 1) those are simply remains of nvidias driver code, 2) apple is heavily testing kepler gpus, but nothing is certain yet, 3) next-gen mbps will indeed have a kepler gpu ;)))
* gl drivers are based on R295 which is the most current nvidia driver version for windows/linux/etc. (hoping for better performance and stability)
* opencl version is still 1.1

#56
lexins

lexins

    InsanelyMac Protégé

  • Members
  • Pip
  • 12 posts
  • Gender:Male
  • Location:Moscow, Russia
patch /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/GeForceGLDriver does not work for ML DP3.
any ideas?

#57
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

patch /System/Library/Extensions/GeForceGLDriver.bundle/Contents/MacOS/GeForceGLDriver does not work for ML DP3.
any ideas?

GeForceGLDriver doesn't need to be patched with 10.8 any more.
libclh.dylib however does (if you don't have gf100/gf110 card) and the fix mentioned in the first post still works.


and ftr: dp3 and dp3 update 1 were kinda borked. dp3 update 2 seems to have fixed everything and works really nice so far :)

edit: interestingly, the size of libclh.dylib has increased by 2 mb in dp3. since there is no opencl 1.2 support for nvidia cards yet, i'm guessing kepler support (there are lots of kepler functions in there ...).

#58
oSxFr33k

oSxFr33k

    InsanelyMac Legend

  • Members
  • PipPipPipPipPipPipPip
  • 811 posts
  • Gender:Male
  • Interests:Sound and Graphic Design. Electronics in general.

@cmf,



Do we need to hex patch the GeoForceGLDriver binary file after 10.7.4 update? I see you do not need to do it for 10.8 is why I ask.


If we do not need to Hex edit it for 10.7.4 should we still run the script below?




echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile



I do understand we still need to patch libclh.dylib as noted in your first post that was edited recently making notes for 10.7.4 and 10.8.



#59
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts

Do we need to hex patch the GeoForceGLDriver binary file after 10.7.4 update? I see you do not need to do it for 10.8 is why I ask.

yes, the 10.7.2/10.7.3 fix still works.

If we do not need to Hex edit it for 10.7.4 should we still run the script below?




echo "export CL_ENABLE_SM2_DEVICE=1" >> ~/.profile

not necessarily. as long as the GeForceGLDriver binary fix is working, you don't need it.

#60
cmf

cmf

    InsanelyMac Geek

  • Members
  • PipPipPip
  • 145 posts
just installed 10.8 dp4 and opencl now works ootb on non-gf100/gf110 cards, so this fix is no longer required :)





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

© 2014 InsanelyMac  |   News  |   Forum  |   Downloads  |   OSx86 Wiki  |   Mac Netbook  |   PHP hosting by CatN  |   Designed by Ed Gain  |   Logo by irfan  |   Privacy Policy