Jump to content

Tracing back the AMD GPU wakeup issue to its origin


Mieze
368 posts in this topic

Recommended Posts

  • 2 weeks later...
  • 4 months later...
On 10/23/2017 at 10:04 PM, Slice said:

@Mieze

It's fantastic work!

Confirm working with my Radeon 6450 in Sierra.

I just made this patch as SSDT

SSDT-AMD.aml.zip

 

My Asus P5Q-Pro MB doesn't have integrated GPU, I have a Radeon HD 6870. Clover inserts device GFX0 and Gibba FB at the device path as shown by ioreg.

 

Accordingly, I have modified the SSDT-AMD.aml as in the snippet below (best guess after gathering information from several related threads). After making this change I forced sleep from the Apple menu. The machine seemed to have gone to sleep but immediately woke up (which is likely a separate issue). Important thing is that the display came back up which suggested that SSDT seems to be working.

 

However, when the machine went to sleep on its own after 30mins of inactivity, it did not come back on. Fans etc were running, but no display. I hit some keys and plugged/unplugged a USB drive, but no response. Could not even ssh into the machine from another machine, so had to do a hard reboot.

 

My question is if my SSDT edit is correct, or should the device path be _SB_.PCI0.P0P2.PEGP, or something else? Any pointers no the machine being unresponsive after auto sleep are much appreciated!

 

DefinitionBlock ("", "SSDT", 2, "Apple", "Radeon", 0x00003000)
{
    External (_SB_.PCI0.P0P2.GFX0, DeviceObj)    // (from opcode)

    Scope (\_SB.PCI0.P0P2.GFX0)
    {
        OperationRegion (PCIB, PCI_Config, Zero, 0x0100)
        Field (PCIB, AnyAcc, NoLock, Preserve)
        {
            Offset (0x10), 
            BAR0,   32, 
            BAR1,   32, 
            BAR2,   64, 
            BAR4,   32, 
            BAR5,   32
        }

 

 

IOReg Gibba.png

Edited by firefox-bin
Link to comment
Share on other sites

  • 1 month later...

@Mieze

Would you please provide instructions on how to dump BARS, the  GPUs control register space, and how to change them.

This is my problem:
 

https://www.insanelymac.com/forum/topic/334965-amd-firepro-w4100-high-cinebench-r15-opengl-if-first-run-under-windows/


I believe that your generic method would be helpful in resolving my problem. The intention is to compare the Gpu's registers (BARs etc) between a soft Windows reboot, and a cold High Sierra boot. Then to patch the DSDT.

I've read the thread and am surprised that no one has inquired about the tools that you have used.

Thank you.
 

Link to comment
Share on other sites

It may be different tools.

RW-Everything in Windows.

RadeonDump in macOS.

Or somehow ACPI debugging.

You also may write your own kext for macOS to dump PCI and MMIO registers into IOLog in macOS.

You may modify Clover sources to dump PCI and MMIO registers into boot.log before any OS.

Link to comment
Share on other sites

Thank you Slice.

I will examine RadeonDump; it's included in DarwinDumper.

I know that this is off-topic.

At the moment I am using Windows 10 as a glorified graphics microcode loader (GuC). Is there anyway to capture the microcode that is loaded by Windows for the AMD Firepro W4100; then use Clover or macOS to write the microcode to the W4100?

The loading of the microcode is the key to unlocking the full 3D performance of the W4100.

Link to comment
Share on other sites

  • 2 weeks later...
  • 2 months later...

Since I replaced my my R9 270X with an RX570 a half year ago, I noticed that this strange kernel message was showing up after wakeup from sleep:

kernel: AppleHDAHDMI_DPDriver::setPowerState(0xdf0b895bf4bcbbf9, 0 -> 1) timed out after 10134 ms
kernel: AppleHDAHDMI_DPDriver::setPowerState(0xdf0b895bf4bcbbf9, 0 -> 1) timed out after 10134 ms

Although it doesn't seem to do much harm to the system, DisplayPort audio used to work well anyway, I'm curious like all cats ;) and decided to investigate the issue. As I'm using system definition iMac18,3 it was the most obvious step to take a look at the IOReg dump of an original Apple machine of that type and compared it with mine. Checking device HDEF, I noticed that the iMac18,3 doesn't have the "hda-gfx" property on that device anymore but instead of it a "No-hda-gfx" property so that I decided to edit my HDEF's method _DSM in order to adopt this change and it made the error message after wakeup disappear.

            Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
            {
                If (LEqual (Arg2, Zero))
                {
                    Return (Buffer (One)
                    {
                         0x03                                           
                    })
                }

                Return (Package (0x06)
                {
                    "No-hda-gfx", 
                    Buffer (0x08)
                    {
                         0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 
                    }, 

                    "layout-id", 
                    Buffer (0x04)
                    {
                         0x01, 0x00, 0x00, 0x00                         
                    }, 

                    "PinConfigurations", 
                    Buffer (Zero) {}
                })
            }

Mieze

Edited by Mieze
  • Like 7
Link to comment
Share on other sites

On 8/1/2018 at 10:10 AM, Slice said:

It may be different tools.

RW-Everything in Windows.

RadeonDump in macOS.

Or somehow ACPI debugging.

You also may write your own kext for macOS to dump PCI and MMIO registers into IOLog in macOS.

You may modify Clover sources to dump PCI and MMIO registers into boot.log before any OS.

I just saw your post and wanted to share this piece of code which I used to dump the GPU's register space in order to investigate the wakeup issue. It adds the dump as property GBUF to the GPU's IORegistry entry while the system is booting so that it my be retrieved using IORegistryExplorer later. Have fun!

                Device (PEGP)
                {
                    Name (_ADR, Zero)  // _ADR: Address
                    OperationRegion (PCIB, PCI_Config, Zero, 0x0100)
                    Field (PCIB, AnyAcc, NoLock, Preserve)
                    {
                        Offset (0x04), 
                        CMDR,   16, 
                        Offset (0x10), 
                        BAR0,   32, 
                        BAR1,   32, 
                        BAR2,   64
                    }

                    OperationRegion (GREG, SystemMemory, And (BAR2, 0xFFFFFFFFFFFFFFF0), 0x8000)
                    Field (GREG, AnyAcc, NoLock, Preserve)
                    {
                        Offset (0x681C), 
                        SBS0,   32
                    }

                    OperationRegion (GREF, SystemMemory, And (BAR2, 0xFFFFFFFFFFFFFFF0), 0x00010000)
                    Field (GREF, AnyAcc, NoLock, Preserve)
                    {
                        GSRC,   524288
                    }

                    Name (GBUF, Buffer (0x00010000) {})
                    Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
                    {
                        Store (Or (CMDR, 0x02), CMDR)
                        Sleep (One)
                        CopyObject (GSRC, GBUF)
                        Store (Package (0x02)
                            {
                                "GBUF", 
                                GBUF
                            }, Local0)
                        DTGP (Arg0, Arg1, Arg2, Arg3, RefOf (Local0))
                        Return (Local0)
                    }
                }

Mieze

  • Like 4
Link to comment
Share on other sites

On 11/13/2018 at 12:11 PM, Mieze said:

Since I replaced my my R9 270X with an RX570 a half year ago, I noticed that this strange kernel message was showing up after wakeup from sleep:


kernel: AppleHDAHDMI_DPDriver::setPowerState(0xdf0b895bf4bcbbf9, 0 -> 1) timed out after 10134 ms
kernel: AppleHDAHDMI_DPDriver::setPowerState(0xdf0b895bf4bcbbf9, 0 -> 1) timed out after 10134 ms

Although it doesn't seem to do much harm to the system, DisplayPort audio used to work well anyway, I'm curious like all cats ;) and decided to investigate the issue. As I'm using system definition iMac18,3 it was the most obvious step to take a look at the IOReg dump of an original Apple machine of that type and compared it with mine. Checking device HDEF, I noticed that the iMac18,3 doesn't have the "hda-gfx" property on that device anymore but instead of it a "No-hda-gfx" property so that I decided to edit my HDEF's method _DSM in order to adopt this change and it made the error message after wakeup disappear.


            Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
            {
                If (LEqual (Arg2, Zero))
                {
                    Return (Buffer (One)
                    {
                         0x03                                           
                    })
                }

                Return (Package (0x06)
                {
                    "No-hda-gfx", 
                    Buffer (0x08)
                    {
                         0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 
                    }, 

                    "layout-id", 
                    Buffer (0x04)
                    {
                         0x01, 0x00, 0x00, 0x00                         
                    }, 

                    "PinConfigurations", 
                    Buffer (Zero) {}
                })
            }

Mieze

It can be done by clover's config.plist Device->Properties

	<key>PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)</key>
	<dict>
		<key>No-hda-gfx</key>
		<data>
		b25ib2FyZC0xAA==
		</data>

Yes?

  • Like 1
Link to comment
Share on other sites

6 hours ago, Slice said:

It can be done by clover's config.plist Device->Properties


	<key>PciRoot(0x0)/Pci(0x1,0x0)/Pci(0x0,0x1)</key>
	<dict>
		<key>No-hda-fgx</key>
		<data>
		b25ib2FyZC0xAA==
		</data>

Yes?

 

Typo?   fgx   is  really    no-hda-gfx   ?

  • Like 1
  • Haha 1
Link to comment
Share on other sites

2 hours ago, Gigamaxx said:

 

Typo?   fgx   is  really    no-hda-gfx   ?

Of course it's a typo and it should read "No-hda-gfx" but apart from that it can be injected in the same way as other properties.

 

Mieze

  • Like 1
Link to comment
Share on other sites

  • 2 years later...
Posted (edited)
On 11/13/2018 at 7:24 AM, Mieze said:

I just saw your post and wanted to share this piece of code which I used to dump the GPU's register space in order to investigate the wakeup issue. It adds the dump as property GBUF to the GPU's IORegistry entry while the system is booting so that it my be retrieved using IORegistryExplorer later. Have fun!



                Device (PEGP)
                {
                    Name (_ADR, Zero)  // _ADR: Address
                    OperationRegion (PCIB, PCI_Config, Zero, 0x0100)
                    Field (PCIB, AnyAcc, NoLock, Preserve)
                    {
                        Offset (0x04), 
                        CMDR,   16, 
                        Offset (0x10), 
                        BAR0,   32, 
                        BAR1,   32, 
                        BAR2,   64
                    }

                    OperationRegion (GREG, SystemMemory, And (BAR2, 0xFFFFFFFFFFFFFFF0), 0x8000)
                    Field (GREG, AnyAcc, NoLock, Preserve)
                    {
                        Offset (0x681C), 
                        SBS0,   32
                    }

                    OperationRegion (GREF, SystemMemory, And (BAR2, 0xFFFFFFFFFFFFFFF0), 0x00010000)
                    Field (GREF, AnyAcc, NoLock, Preserve)
                    {
                        GSRC,   524288
                    }

                    Name (GBUF, Buffer (0x00010000) {})
                    Method (_DSM, 4, NotSerialized)  // _DSM: Device-Specific Method
                    {
                        Store (Or (CMDR, 0x02), CMDR)
                        Sleep (One)
                        CopyObject (GSRC, GBUF)
                        Store (Package (0x02)
                            {
                                "GBUF", 
                                GBUF
                            }, Local0)
                        DTGP (Arg0, Arg1, Arg2, Arg3, RefOf (Local0))
                        Return (Local0)
                    }
                }

Mieze

 

Hi I know this is an old thread, but I have a problem with my Laptop's MXM card and was hoping @Mieze can help me. Everything in my laptop works perfectly (Windows/Linux/Mojave), but Catalina/BS refuse to boot with the DGPU activated. (stalls just after verbose) Catalina/BS work fine on IGPU only.

After many tests I found that editing the outputs in the vbios Rom for a different DGPU model made Catalina work (but it breaks windows/linux, is slower, bricks the laptop in discrete mode, etc...) So I want to be able to use the real Rom. And for this I need to figure out the reason why the "catalina-booting-rom" "boots", is it because something is different in memory space and that enables the driver, or is it because something is broken in the low level part of the Rom and that makes it work once the OS driver kicks in (initialization issue).

I have a couple of "Catalina" working and non working Roms to compare, but without knowing what the OS can or cannot see is a bit like flying in the dark.

After reading this topic I think you used what could be the right approach to getting at least part of this information, so I tried this patch and all I get from GBUF are zeroes.

I used my system's AMDSGTBL SSDT as reference, added your code and modified it to BAR3, and get the same result, and while I know the patch is working (I can dump the PCIB field's values). I guess I'm just not pointing it at the right memory range.

 

The good Rom for everything but Catalina is the 4170.

The Rom that breaks most things but works in Catalina is 4150.

 

PS: arriba los gatos!

Catalina-DGPU.zip

 

 

 

edit:

I included a Mojave dmesg-log where I booted with the 4150 Rom but in clamshell closed mode, and the system stalled on boot at the exact same spot where Catalina stalls with the 4170 Rom. When I opened the Lid, is was still stalled, only picture to monitor, but the difference is that I pressed the power button for 2 seconds and this turned off the display output for a second then the LCD kicked in, then the drivers finished loading, and everything went back to "normal"

So this could be just a weird glitch or it could be part of the mistery that is causing the stall. In clamshell-closed mode the DGPU is already initialized and working by the time the OS drivers kick in, so it could be a clue.

My guess is that the 4170 rom is not correctly initializing under Catalina because the driver finds something in memory that stops it, and if low level isn't working, then card is in uninitialized state, and then OS drivers work.

Edited by theroadw
more info
Link to comment
Share on other sites

@theroadw First of all you have to find the correct BAR. In order to find out which of them points to the GPU register space. AFAIK it's BAR2 on older models and BAR5 on newer ones. You might want to use RWEverything in Win10 to check which BAR is used.

 

Mieze :cat: 

Link to comment
Share on other sites

Posted (edited)

Thank you for your help, so I ran RWE and this is what I get.

 

RWE-4170.thumb.png.7b339bdb4475fc5267a61c8c26ac7c62.png

 

So I edit the SSDT, but I don't know which BAR is used or where to get the display control offset.

 

473938275_ScreenShot2021-05-21at3_28_58PM.thumb.png.3694e46bad997849b9a146e283f3e0b4.png

 

Edited by theroadw
Link to comment
Share on other sites

×
×
  • Create New...