Jump to content

Clover test and patches for Polaris GPU


fantomas
279 posts in this topic

Recommended Posts

did u guys check on clover log if it is using the correct region5 as MMIO space ?

 

i'm testing with clover rev 4035 and it always use region2 on a bonaire family card . 

if (rdev->family >= CHIP_BONAIRE) {
		rdev->rmmio_rid = PCIR_BAR(5);
	} else {
		rdev->rmmio_rid = PCIR_BAR(2);
	}

The RadeonPCI.kext is a great tool - both versions works - tks to Slice.

 

i believe the card init issue might be fixed by looking at the driver memory mapping - i'm working on it atm but not for this card.

It is here

  DBG("PCI region 1 = 0x%8X, region3 = 0x%8X, region5 = 0x%8X\n", Reg1, Reg3, Reg5);
  if (card->info->chip_family >= CHIP_FAMILY_HAINAN && Reg5 != 0) {
    card->mmio = (UINT8 *)Reg5;
    DBG("Use region5 as MMIO space\n");
  }
  pci_dev->regs = card->mmio;

Link to comment
Share on other sites

Made some more tests with my Sapphire RX480 Nitro:

 

Let's take a look at the system_profiler dumps once bootet with RX480 as primary and only GFX (CSM disabled in BIOS):

   | |   |   | | +-o AMDFramebufferVIB  <class AMDFramebuffer, id 0x100000487, registered, matched, active, busy 0 (3 ms), retain 11>
    | |   |   | |   | {
    | |   |   | |   |   "IOClass" = "AMDFramebuffer"
    | |   |   | |   |   "CFBundleIdentifier" = "com.apple.kext.AMDFramebuffer"
    | |   |   | |   |   "IOProviderClass" = "AtiFbStub"
    | |   |   | |   |   "IOPMStrictTreeOrder" = Yes
    | |   |   | |   |   "IOPowerManagement" = {"CapabilityFlags"=0x0,"CurrentPowerState"=0x0}   <--------------- !!! this part is relevant !!!
    | |   |   | |   |   "IOCFPlugInTypes" = {"ACCF0000-0000-0000-0000-000a2789904e"="IOAccelerator2D.plugin"}
    | |   |   | |   |   "IOFBDependentIndex" = 0x0
    | |   |   | |   |   "IOProbeScore" = 0xfe1a
    | |   |   | |   |   "IONameMatch" = "display"
    | |   |   | |   |   "IOFBGammaHeaderSize" = 0x0
    | |   |   | |   |   "IOMatchCategory" = "IOFramebuffer"
    | |   |   | |   |   "iofb_version" = "1.1.42"
    | |   |   | |   |   "IOFBDependentID" = 0x1000001f6
    | |   |   | |   |   "IOFBGammaWidth" = 0xc
    | |   |   | |   |   "IOAccelTypes" = "IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/PEG0@1/IOPP/GFX0@0/AMDRadeonX4100_AMDBaffinGraphicsAccelerator"
    | |   |   | |   |   "IONameMatched" = "display"
    | |   |   | |   |   "IOAccelRevision" = 0x2
    | |   |   | |   |   "IOFBGammaCount" = 0x100
    | |   |   | |   |   "IOAccelIndex" = 0x0
    | |   |   | |   | }

and here the same with IGPU set as primary GFX, RX480 as secondary (and CSM enabled in BIOS):

    | |   |   | | +-o AMDFramebufferVIB  <class AMDFramebuffer, id 0x100000456, registered, matched, active, busy 0 (10 ms), retain 18>
    | |   |   | |   | {
    | |   |   | |   |   "IOFBScalerInfo" = <000000000000000000000000000000002e00000000200000002000000000000000000000000000000000000000000000>
    | |   |   | |   |   "IOPMStrictTreeOrder" = Yes
    | |   |   | |   |   "av-signal-type" = <10000000>
    | |   |   | |   |   "IOFBMemorySize" = 0x10000000
    | |   |   | |   |   "ATY,fb_offset" = <0000300000000000>
    | |   |   | |   |   "IOFBUIScale" = <02000000>
    | |   |   | |   |   "audio-codec-info" = <00010b00>
    | |   |   | |   |   "IOFBDependentIndex" = 0x0
    | |   |   | |   |   "IOFBGammaHeaderSize" = 0x0
    | |   |   | |   |   "boot-gamma-restored" = <00000000>
    | |   |   | |   |   "IOFBGammaCount" = 0x100
    | |   |   | |   |   "IOFBCurrentPixelCount" = 0x879ec0
    | |   |   | |   |   "IOFBCLUTDefer" = Yes
    | |   |   | |   |   "IOFramebufferOpenGLIndex" = 0x3
    | |   |   | |   |   "IONameMatched" = "display"
    | |   |   | |   |   "IOFBI2CInterfaceInfo" = ({"IOI2CBusType"=0x2,"IOI2CSupportedCommFlags"=0x2,"IOI2CTransactionTypes"=0x1f,"IOI2CInterfaceID"=0x0})
    | |   |   | |   |   "ATY,fb_linebytes" = <003c0000>
    | |   |   | |   |   "IODisplayParameters" = "IOFramebufferParameterHandler is not serializable"
    | |   |   | |   |   "startup-timing" = <>
    | |   |   | |   |   "IOAccelTypes" = "IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/PEG0@1/IOPP/GFX0@0/AMDRadeonX4100_AMDBaffinGraphicsAccelerator"
this part is relevant-->"IOPowerManagement" = {"ChildrenPowerState"=0x2,"MaxPowerState"=0x2,"CurrentPowerState"=0x2,"CapabilityFlags"=0x8000,"ChildProxyPowerState"=0x2,"DriverPowerState"=0x1}
    | |   |   | |   |   "IOFBCurrentPixelClock" = 0x1fc8bfd0
    | |   |   | |   |   "IOFBGammaWidth" = 0xc
    | |   |   | |   |   "IOFBDependentID" = 0x1000001f6
    | |   |   | |   |   "IOAccelIndex" = 0x0
    | |   |   | |   |   "graphic-options" = 0x0
    | |   |   | |   |   "IOFBWaitCursorFrames" = 0x1d
    | |   |   | |   |   "IOFBConfig" =
    | |   |   | |   |   "IOFBWaitCursorPeriod" = 0x1fca055
    | |   |   | |   |   "IOFBProbeOptions" = 0x401
    | |   |   | |   |   "IOFBNeedsRefresh" = Yes
    | |   |   | |   |   "IOFBTransform" = 0x0
    | |   |   | |   |   "IOAccelRevision" = 0x2
    | |   |   | |   |   "IOFBI2CInterfaceIDs" = (0x300000000)
    | |   |   | |   |   "IOCFPlugInTypes" = {"ACCF0000-0000-0000-0000-000a2789904e"="IOAccelerator2D.plugin"}
    | |   |   | |   |   "IOProviderClass" = "AtiFbStub"
    | |   |   | |   |   "CFBundleIdentifier" = "com.apple.kext.AMDFramebuffer"
    | |   |   | |   |   "IOFBCursorInfo" = ()
    | |   |   | |   |   "IONameMatch" = "display"
    | |   |   | |   |   "IOFBTimingRange" = <>
    | |   |   | |   |   "IOClass" = "AMDFramebuffer"
    | |   |   | |   |   "IOFBDetailedTimings" = (<
    | |   |   | |   |   "IOFBCurrentPixelCountReal" = 0x879ec0
    | |   |   | |   |   "IOGeneralInterest" = "IOCommand is not serializable"
    | |   |   | |   |   "IOMatchCategory" = "IOFramebuffer"
    | |   |   | |   |   "IOProbeScore" = 0xfe1a
    | |   |   | |   |   "ATY,fb_size" = <0040fa0100000000>
    | |   |   | |   |   "iofb_version" = "1.1.42"
    | |   |   | |   | }

Please keep your attention on the "IOPowerManagement" part. With RX480 set as primary you all know, that the attached screens will NOT get initialized and doesn't show any desktop. Right after verbose boot you might think, the system was crashed... but it isn't. I was still able to connect from my macbook via ssh to get these "system_profiler" dumps from my Hackintosh.

 

I just played around the last few days with some settings you could find within the related AMD...kext files, best one for example is AMD9510Controller.kext. If you take a look into the info.plist, you will find different settings like:

 

 

 

Bildschirmfoto_2017_03_19_um_16_39_47.pn

 

 

 

 

My thoughts were: maybe if we change one of those settings values it might give us the goal we all searching for. Tried a lot of setting-changes with, you might guess it... NO success.

Then i took HOPPER Disassembler to look into the various kext-BINARYs and what shall i say: there are A LOT MORE settings to find (most important in AMD9510Controller.kext and AMDRadeonX4100.kext). Some of them are self explained by their names, others not.

 

Also you could see, that there is "aty_config" and "aty_properties" two times: one for general use and the one under "ATY,Berbice" are framebuffer-related. Lets take the CFG_USE_AGDC for an example. Generally it is set to "CFG_USE_AGDC    Boolean    YES" <--- that means, that the driver will use AppleGraphicsDeviceController policy if no special Framebuffer was provided. Now if you copy this line also to "aty_config" under "ATY,Berbice" you could force the driver to NOT use AppleGraphicsDeviceController policy if you switch the setting to "CFG_USE_AGDC    Boolean    NO". And if you do so, the AGDC policy would no longer respected, which means, you don't need any patch for it! Test it yourself.

 

So i tested a lot of different settings with different values, but none of them are related to our main problem: initialize displays with RX4x0 cards as primary GFX. Wait, one was interesting: "CFG_PAA" which is set to "Number 0". I took this setting and copied it into "ATY,Berbice" and set it to "Number 1" and rebooted the system. Now none of my screens where initialized, even with the settings of a helper card. Removed this value and all went normal. For some settings you could only use "0" or "1" which means "use not" or "use". If you have a "Disable" in the settings name, these values work vise/versa.

 

Ok, back to the main story: as you can see in the first part, when i booted my SIERRA Setup with only the RX480 as primary GFX, i found out, that the IOPowerManagement was set not just to only two values, no, both of them tell us, that their powerstate was set to ZERO, also the capability-flags (for which i don't know what they stand for). Now i need to know, if these values are provided by any "if-this-than-that (je, jne, jmp)" function or if we have a chance to provide them through any info.plist value, we could set? If so, we may make them the same values as you can see in the part, where system was booted with a helper card(IGPU or similar) <--- and this is the point where i need your advice, cause of the "noob" status for coding skills by myself.

  • Like 3
Link to comment
Share on other sites

  • 2 weeks later...

Today my eGPU encloser "Akitio Node" just arrived. Now i can make tests with my MacBookPro late 2013 and the Sapphire RX460 NITRO gfx card. Just connected the encloser via USB-C-to-Thunderbolt2 adapter, installed the eGPU script (by Goalque) under SIERRA 10.12.5 beta1. First have to make some modifications to the script, cause it still tries to modify the AMD9500Controller.kext. But after modifying the script to the correct kexts-name, it installes flawlessly and the card runs well on the MacBookPro:

Bildschirmfoto_2017-03-30_um_21.38.25.pn

Bildschirmfoto_2017-03-30_um_21.38.36.pn

Next step: make the AkitioNode encloser getting correctly detected by my Hackintosh via the Gigabyte AlpineRidge Thunderbolt3 PCIe-card. Until now, it already gets detected by the card, but the script does not recognize, that the enclose is there and the script will stop by mentioning: "HotPlug the Thunderboltcable and try again" ;-(

 

EDIT#1: Luxmark test with only the Sapphire RX460 (36 CUs patch enabled) - but remember: eGFX connected to MacBookPro late 2013 via Thunderbolt2 :

 

luxmark_rx460_36_CUs.jpg

 

Maybe also interesting for our coders: device-path from Thunderbolt

IODevice_Path.jpg

So if one of the coders might be interested in some settings you want me to try and post the results here afterwards: don't hessitate to ask :wink_anim:

  • Like 1
Link to comment
Share on other sites


Ok, got some interesting news here:

 

Got the following config here:

• ASUS Maximus Extreme VIII Motherboard

• Sapphire RX480 Nitro 8GB in PCIe Slot#1

• Sapphire RX460 Nitro 4GB in AKiTiO Node Thunderbolt3 encloser connected to Gigabyte Alpine Ridge TB3 card in PCIe slot#3

• SIERRA 10.12.5 beta 1 (rev. 16F43c)

• CLOVER latest Rev. 4047

• CSM disabled in BIOS

• set PEG as primary GFX (thats the RX480 in PCIe slot#1)

• patched only AMD9500Controller.kext to recognize Device-ID 0x67DF1002

• NOT patched AMDRadeonX4100.kext with Device-ID 0x67DF1002

• patched some of the AMD-related kext with the following entry: IOPCITunnelCompatible = true

• NOT patched AMDRadeonX4100 to get 36 CUs for the RX460

 

booted the system via CLOVER right into desktop. Now take a look at System Information entries for both cards:

 

RX480 first:

Systeminformation_480.jpg

as you can see: NO Metal support. Now lets see the RX460:

Systeminformation_RX460.jpg

Yes, it is true: this card HAS Metal support !

Ok, so now see the entries for both cards from IORegExplorer - again RX480 (in slot#1) first:

IOREG_PCI_1.jpg

NO accelleration for this card, cause we don't have any AMDRadeonX4100 entries. Let's go for the RX460 (via TB3 NODE):

IOREG_PCI_3_TB3.jpg

Again: full support as you can see, we HAVE entries for AMDRadeonX4100.

 

And: i could run any benchmarktool like HEAVEN or LuxMark with full speed on all 4 connected monitors !

 

But there is one thing to mention: if i patch AMDRadeonX4100.kext to recognize 0x67DF1002 (RX480 device id),

i got stuck on boot just before comming to the desktop with same effect, as running one of the RX cards w/o any helper card.

Sorry guys... but i will try hard to found out, what causes the problem.

 

And evan that there are no AMDRadeonX4100 entries for the RX480 card, i still believe it runs as if there where, because that kext is fully loaded and activated for the RX460 <--- but i wouldn't bet my hands for it.

 

PS: forget to mention that i get BERBICE for the RX480, but AMDRadeonFramebuffer for the RX460 - but this is ok, cause both cards use different Connector patches - which could be made possible by CLOVERs "Arbitrary"-function, which i will try later this weekend.

 

PPS: full 8-channel HDMI audio output support on the RX480 (cause of correct patched DSDT and CSM disabled in BIOS):

8-channel_HDMI-audio.jpg

  • Like 4
Link to comment
Share on other sites

Hi slice,

 

maybe this might be interesting for you - my clover bootlog when i was booting with RX480 as primary gfx in slot#1 and RX460 connected via thunderbolt Akitio NODE encloser:

3:487  0:000  Framebuffer @0x90000000  MMIO @0xA0000000 I/O Port @0x0000E000 ROM Addr @0xDE440000
3:487  0:000  PCI region 1 = 0x00000000, region3 = 0x00000000, region5 = 0xDE400000
3:487  0:000  Use region5 as MMIO space
3:487  0:000  BIOS_0_SCRATCH=0x00000000, 1=0x00000000, 2=0x00000003, 3=0x00000000, 4=0x00000000, 5=0x00000000, 6=0x00000000
3:488  0:001  RADEON_CRTC2_GEN_CNTL == 0x00000000
3:488  0:000   card posted because CONFIG_MEMSIZE=0x2000
3:488  0:000  ATI card POSTed,
3:488  0:000  Set VRAM for Ellesmere =8192Mb
3:488  0:000  ATI: get_vram_size returned 0x0
3:488  0:000  ATI Radeon EVERGREEN+ family
3:488  0:000  Users config name Berbice
3:488  0:000  (AtiPorts) Nr of ports set to: 5
3:488  0:000  ATI Ellesmere AMD Radeon RX480 8192MB (Berbice) [1002:67DF] (subsys [174B:E347]):: PciRoot(0x0)\Pci(0x1,0x0)\Pci(0x0,0x0)
3:489  0:000  Framebuffer @0x60000000  MMIO @0x70000000 I/O Port @0x0000C000 ROM Addr @0xC8040000
3:489  0:000  PCI region 1 = 0x00000000, region3 = 0x00000000, region5 = 0xC8000000
3:489  0:000  Use region5 as MMIO space
3:489  0:000  BIOS_0_SCRATCH=0x00000000, 1=0x00000000, 2=0x00000001, 3=0x00000000, 4=0x00000000, 5=0x00000000, 6=0x00000000
3:491  0:001  RADEON_CRTC2_GEN_CNTL == 0x00000000
3:491  0:000   card posted because CONFIG_MEMSIZE=0x1000
3:491  0:000  ATI card POSTed,
3:491  0:000  Set VRAM from config=8192Mb
3:491  0:000  ATI: get_vram_size returned 0x0
3:491  0:000  ATI Radeon EVERGREEN+ family
3:491  0:000  Users config name Berbice
3:491  0:000  (AtiPorts) Nr of ports set to: 3
3:491  0:000  ATI Baffin AMD Radeon RX460 8192MB (Berbice) [1002:67EF] (subsys [174B:E344]):: PciRoot(0x0)\Pci(0x1B,0x0)\Pci(0x0,0x0)\Pci(0x4,0x0)\Pci(0x0,0x0)\Pci(0x1,0x0)\Pci(0x0,0x0)

take a look at line 3:489  0:000  Framebuffer @0x60000000  MMIO @0x70000000 I/O Port @0x0000C000 ROM Addr @0xC8040000

maybe these values might be usefull.

 

Also a usefull note: while booting the RX460 via Akitio Node the last time, it gets AMDRadeonFramebuffer as default frambuffer. This time it also gets BERBICE automatically as Framebuffer. Nothing changed in config

Link to comment
Share on other sites

I'm heading down a different route here, which is to try and prevent the GOP driver on the AMD card from ever being loaded and started in Clover.  I'm hacking my way back through the bootloader ... figured out the driver is being loaded somewhere before RefitMain ... perhaps somewhere in the BDS or DXE phases.  Unfortunately it's harder to experiment here since I can't log to see what's happening in these early phases.  My goal is to A. find a way to identify the offending driver and B. skip it from being loaded.

 

Any tips for debugging in these phases?  Is the only answer attaching a console to a serial port?

Link to comment
Share on other sites

I'm heading down a different route here, which is to try and prevent the GOP driver on the AMD card from ever being loaded and started in Clover.  I'm hacking my way back through the bootloader ... figured out the driver is being loaded somewhere before RefitMain ... perhaps somewhere in the BDS or DXE phases.  Unfortunately it's harder to experiment here since I can't log to see what's happening in these early phases.  My goal is to A. find a way to identify the offending driver and B. skip it from being loaded.

 

Any tips for debugging in these phases?  Is the only answer attaching a console to a serial port?

Use can use legacy Clover (with boot file) which contain own BDS and DXE drivers and so you can debug them.

Link to comment
Share on other sites

Use can use legacy Clover (with boot file) which contain own BDS and DXE drivers and so you can debug them.

I made a pretty silly mistake and was apparently digging through OsxDxeCore the past couple of nights before realizing today that none of that code actually runs in UEFI boot mode. :(

 

Does Clover and the resulting BOOTX64.EFI run the DXE and BDS phases?  If so, do you have pointers as to where in the code I should be looking?

 

I can't run this stuff in legacy mode because the BIOS will apparently init the card's EFI in legacy mode, which sort of defeats the purpose of what I'm trying to do, which is block the GOP driver from loading.

 

EDIT: I'm guessing I should have been digging in the MdeModulePkg?

Link to comment
Share on other sites

Clover == bootx64.efi doesn't contain init DXE, it performed by EFI environment.

In the case of legacy Clover by DUET (boot6). See CloverEFI subproject. Yes, it included MdeModulePkg.

In the case of UEFI boot by your UEFI BIOS.

Clover works in DXE phase set by your BIOS.

Link to comment
Share on other sites

I've been reading this thread cause I have a Polaris GPU. I've noticed some mistakes. First that the memory is being read incorrectly from MMIO or IO space, also that the BAR is retrieved incorrectly as well. First, the BARs should be accessed in order from 0x10 to 0x24 (there are six) in PCI configuration space, the value will be zero if it is unused (if all are unused then they need to be enabled in the control register and setup manually - a lot of work). The first may be a passthru address, I'm not sure it seems to very hard to determine by the documentation. Also, the bottom 4 bits that are being truncated contain information relevant to the location of the memory space. If bit 0 is set then the space is IO otherwise it is MMIO, this changes the method that you can access the memory. If it is MMIO then the next two bits specify the size of the BAR address, if bit 1 and 2 are unset it is 32bit address, if bit 2 is set but bit 1 is unset then it is 64bit address (then 0x14, 0x1C and 0x24 represent the high order dwords for BAR 1, BAR 3, and BAR 5), and if only bit 1 is set then it's only a 20 bit address. Bit 3 is prefetchable, probably not important. But these should be inspected and this address should be used as the base for register reads, offset by the register, without modifying it, like Mmio/IoRead32/64(Base+Register). The offset for the memory size would be 0x5430 from MMIO (this is multiplied by 4 already instead of doing it at runtime and not sure if accessible through IO space), although I could find no documentation for this other than the linux driver. On older documentation the offset is 0xF8 from both IO and MMIO.

 

EDIT: Also the memory size in the older documentation is already shifted left 20 bits, so the highest size would be 4GB. I imagine that HD6XXX or HD5XXX were the last to use this register then.

Link to comment
Share on other sites

EDIT#1: Luxmark test with only the Sapphire RX460 (36 CUs patch enabled) - but remember: eGFX connected to MacBookPro late 2013 via Thunderbolt2 :

 

luxmark_rx460_36_CUs.jpg

 

RX 460 with 36 CUs? Is this typo?

I've got similar score for RX 480 with Berbice framebuffer injected, but with RadeonFramebuffer it's about twice higher.

Link to comment
Share on other sites

I've been reading this thread cause I have a Polaris GPU. I've noticed some mistakes. First that the memory is being read incorrectly from MMIO or IO space

STRT="sh"

END="it"

 

echo "No $STRT$END."  If we knew how to read it correctly, we wouldn't be manually setting the VRAM amount.  Why we can't get a valid VRAM value at the only place we know to look is the million dollar question.

 

 

also that the BAR is retrieved incorrectly as well.

 

No they are not. 

 

First, the BARs should be accessed in order from 0x10 to 0x24

 

They are.  

 

 

You're description of the bit fields of the BAR registers is totally correct.  But there is a difference between reading registers incorrectly, and hard-coding the correct way to read registers.  AMD/ATI cards have known BAR layouts.  For newer cards, it is as follows:

 

 

BAR0 Low bits  ━┓
                 ┣━ 64-bit MMIO Region - This is the VRAM Aperture.  Since GPUs need to maintain 32-bit address space backward-compatibility, 
                 ┃   one access the VRAM via an 'aperture', which can be thought of as a moving address 'window', allowing a portion of the 
                 ┃   available VRAM to be mapped to a single MMIO region.  Also, I believe if an IGPU is present, but the discreet card is primary, 
                 ┃   the IGPUs VBIOS is placed at the start of this aperture, followed by the beginning of the VRAM passthrough
BAR1 High bits ━┛

BAR2 Low bits  ━┓
                 ┣━ 64-bit MMIO Region - This is the doorbell aperture MMIO region.  Doorbells are how the driver queues up rings (jobs),
                 ┃   manages fences (jobs that depend on the completion of other jobs first), etc.  
BAR3 High bits ━┛

BAR4 ━━━━━━━━━ IO Space, 0x100 in length. Allows direct GPIO control.  This is generally used for things like voltage control, 
                     power management, reading sensors, sort of hardware 'janitorial' stuff. 

BAR5 ━━━━━━━━━ 32-bit MMIO Region, always 0x00040000 in length.  This is always the actual register base address.  

Every card from Bonaire onwards has had this layout, and this includes the latest Polaris GPUs. And this is exactly how clover uses them - and that way is correct. Indeed, it is hardcoded, and I agree, in a perfect world, it ought to figure out as much as possible algorithmically rather than being hard coded. But the problem does not lay here, at least not directly. Clover correctly gets all the regions right, and you may confirm this by dumping the pci config spaces from your BIOS' native EDK2 shell.

 

 

It seems like there is a lot of misunderstanding by people (myself included) and people (again, myself included) exploring things that aren't important. Let me try to consolidate what we know and what is or isn't useful:

 

 

1.  There is no incompatibility between Nvidia and AMD cards in the same system.  I'm currently running a GTX 960 as my primary 'helper' card, along with my display hooked up to an RX480.  I can switch between the two without issue.  Both drivers are enabled and working.  I can use OpenCL on both, and CUDA on the Nvidia GPU.  There is some sort of myth about there being problems, but they're aren't.  Testers, don't think you need to use a specific type of helper card, because you don't.  Anything will do.

 

2.  This issue actually effects Tonga and newer AMD cards on Sierra, including real Mac Pros.  The cheese grater ones.  Mac Pros do not have any boot screen, presumably because the driver won't work if those cards are initialized by EFI.  I have confirmed this by suffering a similar issue with a helper card set as primary, but using my card in EFI mode.   

 

3.  This problem was seen on Linux as well.  Kernel 4.7.4 had a strange problem where using rEFIt or grub EFI would result in a black screen once the amdgpu driver loaded.  The problem was fixed by a driver patch that was part of the changes from 4.7.4->4.7.5.  So that problem stopped after kernel version 4.7.5.  It might be possible that a similar issue was, and continues to be, present in Apple's own driver.  Being Apple, it is possible that this bug will never be fixed until the universe has suffered heath death.  

 

4. We have an advantage in that we ought to be able to display video using legacy mode in Clover, yet this does not work either.  This makes me wonder if the bug is related to the one in the linux kernel at all.  It really seems that the only way MacOS can correctly reach the desktop is if No video at all has yet been displayed on the card, regardless of the origin.

 

5.  The screen is displaying black.  This is distinct from the display being inactive.  It is inactive during a boot that will reach the desktop, but if you have video during boot, it will hang right before reaching the desktop, but the screen will be active.  It will simply be displaying all black, however.  That means this is not some connector issue.  The connector is working, the EDID has been read, the card has a framebuffer attached, and HDMI/DVI/DisplayPort/whatever signals are being sent in just the right way for that display, and those signals are telling it to display black.  A lot of things that people seem to be spending a lot of time on must already be working correctly for this to happen.  That's a dead end.  Don't waste time on it.  

 

6.  If I boot with my helper card primary, but EFI active, I will get video during boot.  Sierra still hangs without ever reaching the desktop, but ssh works.  The OS boots, but the graphics driver is stuck.  However, in this case, the screen is not blank - a characteristic pattern of dots always appears in the same pattern, and same location, on my screen.  They are patterned in a way that looks like the driver was trying to write to some adjacent registers somewhere in an MMIO region, but they instead went part way into the framebuffer.  Also, if I relocate the boot console to the second (RX480) display, the buffer is not cleared - it will display the last frame before the hang indefinitely.  HOWEVER, those dots will overwrite whatever is displayed, along with a small bar across my entire screen.  This would be consistent with writing a large continuous block of zeros to the frame buffer.  It really seems like the driver is getting confused about where the framebuffer object is actually located in the VRAM aperture.  It seems like maybe the act of displaying video prior to the driver loading is allocating a framebuffer in the VRAM aperture where the driver expects to find nothing.  More on that later, potentially.  I am still slogging through the linux amdgpu drivers, which as fantastically documented as they are (note: I say that with the heaviest sarcasm), is going to take more time.  I might be chasing nothing at all, who knows.  

 

7.  The actual reason the screen is black/frozen?  The GPU is hung.  There are repeating "IOAccelSurface2::surface_unlock_options(enum eLockType, uint32_t): surface is not locked" faults, several in a row.  It looks like it is locking a non-existent framebuffer/surface, presumeably after attempting to write to it.  And, indeed, besides that, I see repeating "IOAccelFenceMachine::fence_timeout(IOTimerEventSource *): prodding blockFenceInterrupt" faults.  That means it is waiting for some critical enqueued task to finish on the GPU (like finishing a pass on a framebuffer).  Only, it waits like this forever.  The GPU never finishes.  

 

 

If I wait long enough, it eventually tries to reset the GPU, but gets a PCIE channel hardware error.  The GPU isn't just hung, its pretty much {censored}ed.  Nothing will un-{censored} it except a full reboot.  

 

So the question is:  what is causing the driver to lose it's mind and apparently not even be able to create one framebuffer correctly if the card has been displaying video in EFI mode, or in legacy mode,  VGA/VESA video?  What has changed on the card?  The IO and MMIO regions are the same either way.  And those are correct regardless.  No issue there.  

 

This is gonna be a {censored} chafe to fix.  

 

 

 

Characteristic dots:

 

upshot_IO3RtQbJ.png

  • Like 6
Link to comment
Share on other sites

Characteristic dots:

 

Interesting, I hadn't noticed before, but I do get those dots as well. Located near the top of my screen vertically, and just to the right of the middle horizontally (on 3840x2160). I get a vertical strip that's blacked out too, apart from that still get the verbose boot output.

 

post-1658738-0-21849700-1492598339_thumb.jpg

 

 

I've taken a couple of macro shots of the dots. Maybe to someone more familiar with graphics drivers that might give a clue to what sort of pattern is being written there.

 

post-1658738-0-44868000-1492598492_thumb.jpg
post-1658738-0-03626400-1492598498_thumb.jpg

 

post-1658738-0-21849700-1492598339_thumb.jpg

post-1658738-0-44868000-1492598492_thumb.jpg

post-1658738-0-03626400-1492598498_thumb.jpg

Link to comment
Share on other sites

Anyone tried booting a Polaris card into Recovery Mode from PCIe? My 380X works that way (but ofc normal macOS won't except thru iGPU). Might be interesting to see what's handled differently in recovery vs. normal boot.

 

Only way I know to boot normal macOS from PCIe is by essentially preventing the AMD GOP driver from loading (process described here). You won't be able to see your BIOS, bootloader, or Apple loading screen, but this at least disables the iGPU from running in the background.

Link to comment
Share on other sites

Anyone tried booting a Polaris card into Recovery Mode from PCIe? My 380X works that way (but ofc normal macOS won't except thru iGPU). Might be interesting to see what's handled differently in recovery vs. normal boot.

 

Only way I know to boot normal macOS from PCIe is by essentially preventing the AMD GOP driver from loading (process described here). You won't be able to see your BIOS, bootloader, or Apple loading screen, but this at least disables the iGPU from running in the background.

I can also boot into Recovery mode and have full acceleration with my RX 480 and only using fakeid with PCIe set as primary in bios, also works with the installer. Since I am getting full acceleration that means gfx drivers are getting loaded.

  • Like 3
Link to comment
Share on other sites

As you all might now, i am NOT a real coder, but i found this within the following path: edk2/Clover/CloverEFI/GraphicsConsoleDxe/GraphicsConsole.c:

 

 

/**
  Stop this driver on Controller by removing Simple Text Out protocol
  and closing the Graphics Output Protocol or UGA Draw protocol on Controller.
  (UGA Draw protocol could be skipped if PcdUgaConsumeSupport is set to FALSE.)


  @param  This              Protocol instance pointer.
  @param  Controller        Handle of device to stop driver on
  @param  NumberOfChildren  Number of Handles in ChildHandleBuffer. If number of
                            children is zero stop the entire bus driver.
  @param  ChildHandleBuffer List of Child Handles to Stop.

  @retval EFI_SUCCESS       This driver is removed Controller.
  @retval EFI_NOT_STARTED   Simple Text Out protocol could not be found the
                            Controller.
  @retval other             This driver was not removed from this device.

**/
EFI_STATUS
EFIAPI
GraphicsConsoleControllerDriverStop (
  IN  EFI_DRIVER_BINDING_PROTOCOL   *This,
  IN  EFI_HANDLE                    Controller,
  IN  UINTN                         NumberOfChildren,
  IN  EFI_HANDLE                    *ChildHandleBuffer
  )
{
  EFI_STATUS                       Status;
  EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL  *SimpleTextOutput;
  GRAPHICS_CONSOLE_DEV             *Private;

  Status = gBS->OpenProtocol (
                  Controller,
                  &gEfiSimpleTextOutProtocolGuid,
                  (VOID **) &SimpleTextOutput,
                  This->DriverBindingHandle,
                  Controller,
                  EFI_OPEN_PROTOCOL_GET_PROTOCOL
                  );
  if (EFI_ERROR (Status)) {
    return EFI_NOT_STARTED;
  }

  Private = GRAPHICS_CONSOLE_CON_OUT_DEV_FROM_THIS (SimpleTextOutput);

  Status = gBS->UninstallProtocolInterface (
                  Controller,
                  &gEfiSimpleTextOutProtocolGuid,
                  &Private->SimpleTextOutput
                  );

  if (!EFI_ERROR (Status)) {
    //
    // Close the GOP or UGA IO Protocol
    //
    if (Private->GraphicsOutput != NULL) {
      gBS->CloseProtocol (
            Controller,
            &gEfiGraphicsOutputProtocolGuid,
            This->DriverBindingHandle,
            Controller
            );
    } else if (FeaturePcdGet (PcdUgaConsumeSupport)) {
      gBS->CloseProtocol (
            Controller,
            &gEfiUgaDrawProtocolGuid,
            This->DriverBindingHandle,
            Controller
            );
    }

    if (Private->LineBuffer != NULL) {
      FreePool (Private->LineBuffer);
    }

    if (Private->ModeData != NULL) {
      FreePool (Private->ModeData);
    }

    //
    // Free our instance data
    //
    FreePool (Private);
  }

  return Status;
}

 

 

and i thougt: maybe there is a way to call this routine after a detection of an Radeon R3xx or newer card to force an "unload of EFIdriver" after hitting the ENTER key to start booting from CLOVER screen into macOS. That might unload the EFIdriver, stops output of macOS bootlog, but gives a chance to load AMD drivers when a detected AMD card is only and pimary GFX.

 

Hope it is clear of what i am meaning: let the EFI driver been loaded until User decides to make his CLOVER settings. And right after hitting ENTER to start bootprocess into macOS, unload EFIdriver from controller, so macOS will be able to load AMD gfx driver.

 

What do you think it? Oh, and please don't forget: this is just a mention of a possible WORKAROUND, not a final solution.

  • Like 2
Link to comment
Share on other sites

I'v read somewhere that if you dont connect any output display the GOP won't load according to uEFI spec? Can someone try that?Like first remove any DP/HDMI connector then powerup(in EFI mode), then later connect the display?(maybe need to timer on the boot process).

 

When a GPU has no video output device physically connected during a GOP driver binding Start() execution, neither child handles nor GraphicsOutputProtocol will be created or installed. – The platform has to either use another GPU (in multiple GPU present case) or other protocols for console output.

 

http://www.uefi.org/sites/default/files/resources/UPFS11_P4_UEFI_GOP_AMD.pdfPage 10

 

 

As you all might now, i am NOT a real coder, but i found this within the following path: edk2/Clover/CloverEFI/GraphicsConsoleDxe/GraphicsConsole.c:

 

and i thougt: maybe there is a way to call this routine after a detection of an Radeon R3xx or newer card to force an "unload of EFIdriver" after hitting the ENTER key to start booting from CLOVER screen into macOS. That might unload the EFIdriver, stops output of macOS bootlog, but gives a chance to load AMD drivers when a detected AMD card is only and pimary GFX.

 

Hope it is clear of what i am meaning: let the EFI driver been loaded until User decides to make his CLOVER settings. And right after hitting ENTER to start bootprocess into macOS, unload EFIdriver from controller, so macOS will be able to load AMD gfx driver.

 

What do you think it? Oh, and please don't forget: this is just a mention of a possible WORKAROUND, not a final solution.

  • Like 1
Link to comment
Share on other sites

I guess it was related to the fact that my monitor was an  unbranded one, which doesn't response to power signal nicely(e.g, if I previouse used DVI and plug in DP/HDMI, it won't switch and just power off itself, no matter Windows/Linux/Mac).

 

I have used helper card but my monitor's color profile become broken after video output switched to 480. The system becomes really noisy and hot. Although I can get a good Luxmark score(14000ish), the real world performance sux, the FPS in dota 2 is in range of 20-60 fps, and my GTX 1060 give me a constant 60+ fps( on windows ~140)

 

At this point, I am really disappointed about this card and its support status in MacOS. The only reason I go for AMD is that they don't have iBook or Metal related bugs, but AMD introduced more issues when it's an unsupported 480.

 

I really hope we can find an elegant solution, best to be from AMD.

 

about 20 secs

Link to comment
Share on other sites

As you all might now, i am NOT a real coder, but i found this within the following path: edk2/Clover/CloverEFI/GraphicsConsoleDxe/GraphicsConsole.c:

 

 

/**
  Stop this driver on Controller by removing Simple Text Out protocol
  and closing the Graphics Output Protocol or UGA Draw protocol on Controller.
  (UGA Draw protocol could be skipped if PcdUgaConsumeSupport is set to FALSE.)


  @param  This              Protocol instance pointer.
  @param  Controller        Handle of device to stop driver on
  @param  NumberOfChildren  Number of Handles in ChildHandleBuffer. If number of
                            children is zero stop the entire bus driver.
  @param  ChildHandleBuffer List of Child Handles to Stop.

  @retval EFI_SUCCESS       This driver is removed Controller.
  @retval EFI_NOT_STARTED   Simple Text Out protocol could not be found the
                            Controller.
  @retval other             This driver was not removed from this device.

**/
EFI_STATUS
EFIAPI
GraphicsConsoleControllerDriverStop (
  IN  EFI_DRIVER_BINDING_PROTOCOL   *This,
  IN  EFI_HANDLE                    Controller,
  IN  UINTN                         NumberOfChildren,
  IN  EFI_HANDLE                    *ChildHandleBuffer
  )
{
  EFI_STATUS                       Status;
  EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL  *SimpleTextOutput;
  GRAPHICS_CONSOLE_DEV             *Private;

  Status = gBS->OpenProtocol (
                  Controller,
                  &gEfiSimpleTextOutProtocolGuid,
                  (VOID **) &SimpleTextOutput,
                  This->DriverBindingHandle,
                  Controller,
                  EFI_OPEN_PROTOCOL_GET_PROTOCOL
                  );
  if (EFI_ERROR (Status)) {
    return EFI_NOT_STARTED;
  }

  Private = GRAPHICS_CONSOLE_CON_OUT_DEV_FROM_THIS (SimpleTextOutput);

  Status = gBS->UninstallProtocolInterface (
                  Controller,
                  &gEfiSimpleTextOutProtocolGuid,
                  &Private->SimpleTextOutput
                  );

  if (!EFI_ERROR (Status)) {
    //
    // Close the GOP or UGA IO Protocol
    //
    if (Private->GraphicsOutput != NULL) {
      gBS->CloseProtocol (
            Controller,
            &gEfiGraphicsOutputProtocolGuid,
            This->DriverBindingHandle,
            Controller
            );
    } else if (FeaturePcdGet (PcdUgaConsumeSupport)) {
      gBS->CloseProtocol (
            Controller,
            &gEfiUgaDrawProtocolGuid,
            This->DriverBindingHandle,
            Controller
            );
    }

    if (Private->LineBuffer != NULL) {
      FreePool (Private->LineBuffer);
    }

    if (Private->ModeData != NULL) {
      FreePool (Private->ModeData);
    }

    //
    // Free our instance data
    //
    FreePool (Private);
  }

  return Status;
}

 

 

and i thougt: maybe there is a way to call this routine after a detection of an Radeon R3xx or newer card to force an "unload of EFIdriver" after hitting the ENTER key to start booting from CLOVER screen into macOS. That might unload the EFIdriver, stops output of macOS bootlog, but gives a chance to load AMD drivers when a detected AMD card is only and pimary GFX.

 

Hope it is clear of what i am meaning: let the EFI driver been loaded until User decides to make his CLOVER settings. And right after hitting ENTER to start bootprocess into macOS, unload EFIdriver from controller, so macOS will be able to load AMD gfx driver.

 

What do you think it? Oh, and please don't forget: this is just a mention of a possible WORKAROUND, not a final solution.

It's already possible to unload the GOP driver at the EFI shell.  I have tried it and it does not seem to help anything.  The EFI driver must be prevented from ever loading.

 

The problem is that booting in pure UEFI mode enumerates and loads all EFI drivers in PCI option ROM's and there's no mechanism for blacklisting a particular EFI driver in the BIOS.  I have been thinking about playing around with the Fast Boot settings in my BIOS, but haven't gotten around to it yet.  Everything is a long shot tho.

 

It may be possible in legacy mode boot to explicitly reject the AMD GOP driver, since at that point Clover is taking on DXE and BDS phases, I believe.  However, this approach doesn't interest me much so I haven't pursued it.

Link to comment
Share on other sites

I found out that I can actually hear in which state my Sapphire RX460 is:

  • The card istn't initialized: The fans are stopped and it's completely quiet
  • The card is initialized by UEFI / GOP driver: The fans are running and quite noisy (seems that there istn't any fan control)
  • When an OS starts controlling the initialized card (tested with Linux) the fans slow down
  • When an OS starts controlling the not-initialized card (tested with Linux and macOS, iGPU is primary) the fans go up, but slow down immediately.

So, I made a few tests and "heard" the result:

 

RX460 is primary and only video card:

​Boot to Recovery HD, one DVI display connected:

  • The fans get noisy right after BIOS beep, system freezes at blank recovery background. The fans stay noisy, so no kext loaded successfully!

Boot to normal Mac OS, one DVI display connected:

  • The fans get noisy right after BIOS beep, system boots to black screen. Also, the fans stay noisy, so no kext loaded successfully.

Boot to normal Mac OS, no display connected:

  • Same result as with a display, there is no signal when I connect the DVI display after one minute. The fans stay noisy.

Seems like my UEFI (GA-H81M-HD3) always loads the GOP driver, even without any display connected. Also, there seems to be no difference between Recovery Mode and normal boot.

Link to comment
Share on other sites

×
×
  • Create New...