Jump to content

Fermi Freeze "Investigation"


dan542
 Share

62 posts in this topic

Recommended Posts

Hi all,

 

I have a GTX 560 Ti and it suffers from the so called "Fermi Freeze". What is that? Well basically, when I use my system, once in a while, my whole system freezes, except for the cursor. Music still plays, and I can still ssh into my computer. I can find something like "NVDA(OpenGL): Channel exception" in the kernel log (sudo dmesg in terminal (through ssh ;)) or in Console.app after reboot). From what I've found on the internet channels are some sort of multitasking on the GPU and each app (that is using OpenGL) should have it's own channel.

 

I've seen some users trying to tweak their AGPM.kext so that their card never enters the lowest power state. The card indeed does not go in to the lowest power state, but as it turns out, the GPU being in the lowest power state is NOT the problem. You will still have freezes on a Fermi GPU even if you edit your AGPM.

 

Also, I've seen some people using apps that give them animated wallpaper, or running iTunes visualizer, or running CUDA-Test.app (it just draws a bunch of triangles using OpenGL and READS OUT some data USING CUDA). And these solutions do indeed work.

 

So, seing that, I made a simple app, that draws ONE triangle using OpenGL and is invisible. I've added that app in on startup items and since that, I've never experienced Fermi freeze again. In case anyone's interested in using that, the app is here and it's source code is here. Feel free to use both the binary and its source code under the terms of WTFPL.

 

My hypothesis is that when my app is running, it uses one (or more?) channels and for some reason the thing that manages them (NVDAGF100Hal(Web).kext?) never does something bad. What "something bad" is, I don't know. It might be closing all channels? But then why would my cursor stay?

 

Now this does work, but I don't like this solution at all… I'm always afraid that my system will freeze when I do a system update, so far it hasn't happened to me, but the possibility is there, since when I'm not logged in, my app doesn't run.

 

So I was thinking, assuming my hypothesis is right, it should be possible to just allocate one channel on startup and free it on shutdown, ideally as a kext.

 

 

I have a question for all kernel hackers on this site:

I was looking at the symbols in NVDAGF100HalWeb.kext:

daniel-pc:MacOS daniel$ pwd
/System/Library/Extensions/NVDAGF100HalWeb.kext/Contents/MacOS
daniel-pc:MacOS daniel$ nm -m NVDAGF100HalWeb | grep --color -i channel
                 (undefined) external _CliDispGetDispChannelInfo (dynamically looked up)
                 (undefined) external _CliFindDispChannelObject (dynamically looked up)
                 (undefined) external _CliGetChannelClassInfo (dynamically looked up)
                 (undefined) external _CliGetChannelID (dynamically looked up)
                 (undefined) external _CliGetChannelIDFromDeviceFifo (dynamically looked up)
                 (undefined) external _bifConfigureVirtualChannel_STUB (dynamically looked up)
                 (undefined) external _cipherExecuteDHKeyExchangeOnChannel_STUB (dynamically looked up)
                 (undefined) external _dispAllocChannel_STUB (dynamically looked up)
                 (undefined) external _dispEnsureChannelState_STUB (dynamically looked up)
                 (undefined) external _dispFreeChannel_STUB (dynamically looked up)
                 (undefined) external _dispGetChannelClassAndInstance_STUB (dynamically looked up)
                 (undefined) external _dispGetChannelNum_STUB (dynamically looked up)
                 (undefined) external _dispGetHwChannelStateMask_STUB (dynamically looked up)
                 (undefined) external _dispGetNumChannelsWithExceptArgs_STUB (dynamically looked up)
                 (undefined) external _dispGetNumChannels_STUB (dynamically looked up)
                 (undefined) external _dispGrabChannel_STUB (dynamically looked up)
                 (undefined) external _dispIsChannelAllocated_STUB (dynamically looked up)
                 (undefined) external _dispQuiescentChannel_STUB (dynamically looked up)
                 (undefined) external _dispReadAwakenAndExceptionChannelNumMask_STUB (dynamically looked up)
                 (undefined) external _dispReadChannelState_STUB (dynamically looked up)
                 (undefined) external _dispResetAwakenAndExceptionChannelNumMask_STUB (dynamically looked up)
                 (undefined) external _dispRestoreChannels_STUB (dynamically looked up)
                 (undefined) external _dispStopChannel_STUB (dynamically looked up)
                 (undefined) external _dmaBindToChannel_STUB (dynamically looked up)
                 (undefined) external _dmaUnbindFromFifoChannel_STUB (dynamically looked up)
                 (undefined) external _fbScrubInitScheduleChannel_STUB (dynamically looked up)
                 (undefined) external _fifoChannelGroupGetDefaultTimeslice_STUB (dynamically looked up)
                 (undefined) external _fifoChannelGroupSetTimeslice_STUB (dynamically looked up)
                 (undefined) external _fifoErrorResetInitChannelMap_STUB (dynamically looked up)
                 (undefined) external _fifoGetChannelDeviceShadowPram_STUB (dynamically looked up)
                 (undefined) external _fifoGetChannelEngMask_STUB (dynamically looked up)
                 (undefined) external _fifoGetChannelNextPram_STUB (dynamically looked up)
                 (undefined) external _fifoGetChannelPram_STUB (dynamically looked up)
                 (undefined) external _fifoGetMaxChannelGroupSize_STUB (dynamically looked up)
                 (undefined) external _fifoGetMaxChannelGroups_STUB (dynamically looked up)
                 (undefined) external _fifoLoadChannel_STUB (dynamically looked up)
                 (undefined) external _fifoPreemptAndDisableChannels_STUB (dynamically looked up)
                 (undefined) external _fifoRunlistGroupChannels_STUB (dynamically looked up)
                 (undefined) external _fifoRunlistWriteChannelEntry_STUB (dynamically looked up)
                 (undefined) external _fifoSetChannelEnableByEngine_STUB (dynamically looked up)
                 (undefined) external _fifoSetChannelGroupInUse_STUB (dynamically looked up)
                 (undefined) external _fifoSetSLIVirtualChannelCtx_STUB (dynamically looked up)
                 (undefined) external _fifoSetUpChannelDmaOffset_STUB (dynamically looked up)
                 (undefined) external _fifoSetUpChannelPio_STUB (dynamically looked up)
                 (undefined) external _fifoStompChannel_STUB (dynamically looked up)
                 (undefined) external _fifoUpdateSubChannelContext_STUB (dynamically looked up)
                 (undefined) external _grLoadChannelContextSw_STUB (dynamically looked up)
                 (undefined) external _rcStompChannel_STUB (dynamically looked up)
                 (undefined) external _rcWatchdogInitScheduleChannel_STUB (dynamically looked up)
daniel-pc:MacOS daniel$ 
Is it possible to create a kext and somehow call (it'd be helpful to find out what do these functions return) _dispAllocChannel_STUB(and the arguments that belong here) and _dispFreeChannel_STUB(and here) from it? I don't have any experience with kernel development… I do have some experience developing for iOS and OS X, however.
  • Like 3
Link to comment
Share on other sites

Hi Dan, is good to see I´m not alone with interest on investigating these F´king Fermi Freeze in order to get a full solution, I will like to add some thoughts to these investigation. This Freezee in deed doesn't happen for all the Fermi cards, now after 10.8.3 update the number of affected cards just grow up, side now is affecting GF104, GF114, GF106, GF116, and some GF108, GF118 cards but the GF100 and GF110 are reported to still be inmune to this problem, this high end GPU Family are the same founded in the Quadro 4000 which are the only officially supported.

 

If the Nvidia Drivers are really using an unified architecture the same driver will work for all the sub family of the Fermi GPU´s, but this time the driver for GF100, GF110 seems to have some tweaks for avoid this Freeze problem. Can we just use the device id of the Quadro 4000?, Can we patch this "tweaks" to the other GPU´s??

 

Good luck 

Link to comment
Share on other sites

I'm running with MacPro5,1 smbios and I do get freezes if I don't use my app. I have also tried MacPro4,1 and it did get freezes, too.

 

You know what? Try ctrl + shift + eject to put your screen to sleep and then click using your mouse to show your login screen. Do that a couple of times and I'm pretty sure that you will eventually get a freeze…

Link to comment
Share on other sites

This Freezee in deed doesn't happen for all the Fermi cards, now after 10.8.3 update the number of affected cards just grow up, side now is affecting GF104, GF114, GF106, GF116, and some GF108, GF118 cards but the GF100 and GF110 are reported to still be inmune to this problem, this high end GPU Family are the same founded in the Quadro 4000 which are the only officially supported.

 

 

...ok...just did 15 repetitions of the recipe control...shift...eject... no freeze...go figure... :smoke:

GT 430 is GF108, so I guess that's why... :)

 

__________________________________________________________________________________________

 

 

Hi Dan, is good to see I´m not alone with interest on investigating these F´king Fermi Freeze in order to get a full solution, I will like to add some thoughts to these investigation. This Freezee in deed doesn't happen for all the Fermi cards, now after 10.8.3 update the number of affected cards just grow up, side now is affecting GF104, GF114, GF106, GF116, and some GF108, GF118 cards but the GF100 and GF110 are reported to still be inmune to this problem, this high end GPU Family are the same founded in the Quadro 4000 which are the only officially supported.

 

If the Nvidia Drivers are really using an unified architecture the same driver will work for all the sub family of the Fermi GPU´s, but this time the driver for GF100, GF110 seems to have some tweaks for avoid this Freeze problem. Can we just use the device id of the Quadro 4000?, Can we patch this "tweaks" to the other GPU´s??

 

Good luck

No, I don't think they have fixed the issue for GF100 and GF110. Most likely they broke something for unsupported GPUs while being busy optimizing for the supported ones…

 

But maybe we are wrong thinking that it's in the driver… If merely changing SMBIOS has worked for robertx, then maybe it's something in Quartz? I really don't think drivers behave differently based on SMBIOS…

Link to comment
Share on other sites

I can´t find any gtx 465, 470, 580 or gtx 570, 580 with this issue. In Lion up to 10.7.4 my GTX 460 was just clean of this issue, with Mountain Lion GM, 10.8.1 and 10.8.2 ( no updates to graphics since GM) was just fine also clean of this Freeze but with the Nvidia retail Drivers for 10.8.1 and 10.8.2 I had this freeze back. When 10.8.3 just came up, the apple drivers stared to have the same freeze.

 

With my GTX 560M inside my laptop I tried 10.7.3 up to 10.7.5 and always had freeze, with Mountain Lion DP no freeze up to DP3 and in DP3 update 1 they  broke the frame buffer, and the sleep screen and change resolution problem started, GM, 10.8.1,10.8.2 the same thing, with Nvidia web drivers 10.8.1 10.8.2 no sleep or resolution change problem, but random freeze issue came back, and with apple´s official drivers in 10.8.3 up to 10.8.5 F33 build freeze, freeze, freeze :(

 

In Mavericks I also have this freeze with my GTX 560M, I had not try GTX460.

 

I try AGPM edit, smbios profiles, Mac Pro 3,1 , 4,1 and 5,1 even I tried some faked macbook pro 8,4 so I can use Nvdia retail drivers with out freeze on video content.

 

Now I´m on investigating the EFI strings injected for the nvidia cards in real macs, until now freeze issue still happens. This investigation gives me Brightness control and Underscan, wich are now working .

 

We need to first know which cards are affected and which cards are not affected so we can find some pattern to fix this.

 

Good luck

Link to comment
Share on other sites

Hi, i want to tell, that even my old 9600 GT (NONE FERMI!!) had massive OpenGL Channel Timeouts (=freezes) with an Beta 10.8.2 build. My new GT 430 hasnt any using 10.8.4+.

Both cards are not used with AGPM. (No dev id added in AGPM = not used). But both cards use (ed) at least lowest gpu MHz and highest gpu MHz without AGPM! I think thats made by the cards (BIOS) itself.

For my 430 GT it means 50 MHz (idle = 2D Desktop) and 900 MHz. I believe that adding AGPM working cant fix freezes on affected cards. Only may avoid some short slowniness (at first half second until GPU switches from 50 MHz to fullspeed) running OpenGL tasks.

  • Like 1
Link to comment
Share on other sites

For my 430 GT it means 50 MHz (idle = 2D Desktop) and 900 MHz. I believe that adding AGPM working cant fix freezes on affected cards. Only may avoid some short slowniness (at first half second until GPU switches from 50 MHz to fullspeed) running OpenGL tasks.

...adding AGPM adds the 405 MHz GPU core clock...perhaps smoother transition from idle to full load and back... :smoke:

Link to comment
Share on other sites

  • 2 weeks later...

I´m testing Mavericks DP6 right now, and after fix this issue http://www.insanelymac.com/forum/topic/291423-dp6-graphics-crashing/ which is not related this Fermi problems my laptop is pretty stable and none single Fermi freeze since I upgrade last August 23, I hope they keep this way until GM.

 

Please try Mavericks DP6 and report back, or wait to DP7 maybe this Wednesday. 

 

Good Luck

Link to comment
Share on other sites

Have you tested with DP7 already?

 

If it works fine with our Fermi cards, I'm seriously thinking of installing it. Is it stable enough for daily usage? I'm using iOS 7 on my phone and it's fine, but I've got no idea about Mavericks - perhaps I could test it on my MacBook first… Also, will I be able to update my install to the official version that I'll have bought from App Store? I know that Apple says no, but…

Link to comment
Share on other sites

Yes I do, DP7 seems stable as DP6 was on my Laptop, for 2 days none a single freeze, with DP6 none a single freeze either, I think actually they have fix it, this is my theory is:

 

-Mountain Lion DP3 update 2 they introduce the Kepler support and they broke something on the Fermi side, all Mountain Lion life this issue remain, even on real macs there are report of this issue.

 

-With Mavericks they split the OpenGL and Video acceleration drivers in 2, one for the old NV50 (tesla cards) and the other for fermi and kepler cards, I think because they want to fix this, but this does´t fix the freeze for me, this apply to DP1 to DP4. In DP5 they also split the NVDAResman Driver, one for the old NV50 (tesla cards) and the other for Fermi and Kepler cards, this is when I started to try out running my system without your freeze fix app, and none a single freeze using it on Safari, and the animation for full screen mode works every time (before this is where I get most of this Freeze). But I still had some freeze waking the system from sleep.

 

And in DP6 (I have been using it since came out two weeks ago) none a single freeze and all works really good so I started to using it as main system, in DP7 all works good too.

 

So if you ask me, I must say start using Mavericks right now and please let us know how does it work for you and if the freeze is gone like mine.

 

 

Good Luck

Link to comment
Share on other sites

Okay, so I've installed it on my MacBook and it seems to be fine, can't wait to test it with my Fermi card. I'm still trying to figure out how change MBR partition scheme to GUID (on my data HDD, I have a GUID SSD for system) on my hackintosh, so that I can clone my current ML partition to the HDD and boot it in case anything goes wrong… also it appears that you can't resize partitions if you have MBR… I've tried an utility called gdisk in Ubuntu booted from a flash drive, but it wanted me to move or resize my first partition… I tried to do so, but I gave up, since it appeared to take forever. Thankfully, my data are or at least appear to be intact.

 

By the way, did you have any problems with sound? How ironic that you have a fully working hackintosh laptop and I don't get sound on my REAL MacBook Pro… :)

Link to comment
Share on other sites

Well, I patch my current AppleHDA for my ALC269, and is fully working, Speakers, Headphone, Internal and external Mic. I use an IOAudioFamily kext by EMlyDinEsH http://forum.osxlatitude.com/index.php?/topic/1970-fix-for-audio-issue-after-sleep-in-alc269/ for audio issue after sleep.

 

I also follow Toleda directions to fix HDMI on my GTX 560M, so no problems with audio here.

 

Cheers

Link to comment
Share on other sites

I've fixed it already. I also had problems with Notes.app and many other things not working properly. It turns out that the "combo" update that I had found on the internet haven't really worked properly (perhaps it was a delta labeled as a combo), so I've just installed delta updates one by one to get to DP 7 and it works great now!

 

Anyway, I want to test it on my hackintosh, but I need to find out what to do with my data… For some reason the data partition seems to take up the whole HDD in Disk Utility.app and I can't change it's size, but it is only 650 GB and I want to make it 1 TB - 128 GB (size of my SSD, I want to clone it's contents on the HDD so I can boot of it if anything goes wrong with Mavericks). It seems that it wants GUID, not MBR, so I've tried an app called gdisk in Ubuntu, that is supposed to convert MBR to GUID, but is's complaining that my first partition overlaps some GUID sectors… So the only way seems to move the partition a few kilobytes to the right maybe, but the problem with this is that it takes forever to complete…

 

Here's a screenshot of my HDD in the Disk Utility.app (I've switched the language to English for you :)):

 

Sn%C3%ADmek%20obrazovky%202013-09-09%20v

Link to comment
Share on other sites

Almost two more weeks without a single freeze using DP7. So, have fermi freeze issue actually fixed???

 

At lest for me since DP6 and now DP7.

 

Good luck

There is just one downside: sometimes it doesn't occur for several days. Than it's happening again. :(

 

In Linux there are also problems with Channel-timeouts but only on suspend. I think that nVidia just hasn't optimized the drivers for 'us' Fermi users. Drivers can do more and more than a few years ago.

Link to comment
Share on other sites

Well it could happen again I know that, for me the record was 7 days but this time almost a month has passed since the last freeze. I call to all the fermi users out there for try it out so we can find an answer.

 

Good luck

  • Like 1
Link to comment
Share on other sites

I keep getting those damn freezes in Xcode...  :(

Are you using a 2 display configuration?, are the same NVDA Channel Timeout messages in log?, how do you inject your card? GE=yes? DSDT?

 

Sorry to hear that

Link to comment
Share on other sites

Are you using a 2 display configuration?

No.

are the same NVDA Channel Timeout messages in log?

Yes. I got some NVDA(private) and some NVDA(OpenGL) channel exceptions...

how do you inject your card? GE=yes? DSDT?

I use Clover's graphics injection, which should be equivalent to Chameleon's GE. I don't use any custom DSDT, I only have for sound fix done for automatically by Clover. (everything including sleep works for me :))

 

 

When I finish my work in Xcode, I'll kill freezefix and see how it does. When I use Xcode, my system usually freezes after a while. Could you please try using Xcode for a while?

Link to comment
Share on other sites

 Share

×
×
  • Create New...