Jump to content
vit9696

AptioMemoryFix

457 posts in this topic

Recommended Posts

Good to have. Well, the second picture makes it very clear. XNU kernel invokes APTIO RuntimeServices SetVariable code, and then this code never returns. 

 

What we have in SetVariable is the following code coming from NvramDxe, I can tell that it did not change anyhow since the source leak, and the one in the source leaks are known to work.

 

 

UINTN GetVariableNameSize(IN CONST CHAR16 *String, IN UINTN MaxSize){
    CHAR16 *Str, *EndOfStr;
    ASSERT(String!=NULL);
    if (String==NULL) return 0;
    
    EndOfStr = (CHAR16*)((UINT8*)String + MaxSize);
    for(Str = String; Str < EndOfStr; Str++)
        if (!*Str) return (Str - String + 1)*sizeof(CHAR16);

    return MaxSize+1;
}

EFI_STATUS Communicate (UINTN MessageLength){
    UINTN CommSize;
    UINT64 Control; 
    EFI_STATUS Status;
    
    if (SmmCommProtocol==NULL) return EFI_UNSUPPORTED;
    if (   NvramSmmCommunicationBuffer == NULL 
        || NvramSmmCommunicationBufferPhysicalAddress == NULL
    ) return EFI_OUT_OF_RESOURCES;
    if (MessageLength > MaxMessageLength) return EFI_OUT_OF_RESOURCES;

    Control = NvramSmmCommunicationBuffer->Control;
    NvramSmmCommunicationBuffer->MessageLength = MessageLength;
    CommSize = CommunicationHeaderSize + MessageLength;
    Status = SmmCommProtocol->Communicate (SmmCommProtocol, NvramSmmCommunicationBufferPhysicalAddress, &CommSize);

    if (EFI_ERROR(Status)) return Status;
    if (NvramSmmCommunicationBuffer->Control == Control)
            return EFI_NO_RESPONSE;
    if ((NvramSmmCommunicationBuffer->Control & NVRAM_SMM_ERROR_BIT)!=0)
        Status = NVRAM_SMM_STATUS_TO_EFI_STATUS(NvramSmmCommunicationBuffer->Control);
    return Status;
}

EFI_STATUS DxeSetVariableSmmWrapper (
    IN CHAR16 *VariableName, IN EFI_GUID *VendorGuid,
    IN UINT32 Attributes, IN UINTN DataSize, IN VOID *Data
)
{
    EFI_STATUS Status;
    UINTN AvailableBufferSize, VariableNameSize;
    SMI_SET_VARIABLE_BUFFER *SetBuffer;

    if (NvramSmmCommunicationBuffer == NULL) return EFI_UNSUPPORTED;
    if (!VariableName || !VendorGuid || (DataSize && !Data))
        return EFI_INVALID_PARAMETER;
    
    AvailableBufferSize = MaxMessageLength - sizeof(SMI_SET_VARIABLE_BUFFER);
    VariableNameSize = GetVariableNameSize(VariableName, AvailableBufferSize);
    
    // If variable name or data is too large to fit into our buffer, it is also too large to fit
    // into NVRAM store.
    if (AvailableBufferSize < VariableNameSize) return EFI_OUT_OF_RESOURCES;
    AvailableBufferSize -= VariableNameSize;
    if (AvailableBufferSize < DataSize) return EFI_OUT_OF_RESOURCES;

    SetBuffer = (SMI_SET_VARIABLE_BUFFER *)&NvramSmmCommunicationBuffer->Control;
    SetBuffer->Control = NVRAM_SMM_COMMAND_SET_VARIABLE;
    SetBuffer->Attributes = Attributes;
    SetBuffer->DataSize = DataSize;
    SetBuffer->Guid = *VendorGuid;
    SetBuffer->VariableNameSize = VariableNameSize;
    MemCpy(SetBuffer+1, VariableName, VariableNameSize);
    MemCpy((UINT8*)(SetBuffer+1)+VariableNameSize, Data, DataSize);
    
    Status = Communicate( sizeof(SMI_SET_VARIABLE_BUFFER) + VariableNameSize + DataSize );

    return Status;
}

EFI_STATUS DxeSetVariableSafe(
    IN CHAR16 *VariableName, IN EFI_GUID *VendorGuid,
    IN UINT32 Attributes, IN UINTN DataSize, IN VOID *Data
)
{
    EFI_STATUS Status;

    BEGIN_CRITICAL_SECTION(NvramCs);
    if (NvramSmmIsActive)
        Status = DxeSetVariableSmmWrapper(
                     VariableName,VendorGuid,Attributes,DataSize,Data
                 );
    else
        Status = DxeSetVariableWrapper(
                     VariableName,VendorGuid,Attributes,DataSize,Data
                 );
    END_CRITICAL_SECTION(NvramCs);
    return Status;
}

 

 

The code relevant to SMM switching looks the same too, and EFI_SMM_COMMUNICATION_PROTOCOL implementation is provided by EDK2. They still allocate the SMM communication buffer as EfiRuntimeServicesData, and still pass its address via NvramMailbox NVRAM variable, so it should be guarded by AptioMemoryFix. As a result I believe that the infinite loop happens somewhere on the way to NvramSmm (which now represents former Smi and Smm code glued together). However, the brief checking of the binary and the source shows that the Smi handler (NvramSmmCommunicationHandler, SetVariableSmmHandler) is pretty much the same too. This leaves us in an uneasy situation, where we do not know where to look for the problem.

 

What I could suggest is writing a EFI runtime driver (by ripping off the known APTIO V source) that will reimplement communication with SMM:

1. Allocate a new communication buffer.

2. Check & overwrite the address of the old communication buffer in MailBox variable

3. Overwrite EFI_RUNTIME_SERVICES Variable functions with APTIO code but the new communication buffer.

 

The above will result in having a complete path prior to SMM code under our control. Afterwards we should be able to get this code fully functional on some working APTIO V system (e.g. Skylake or Kaby Lake), and try it on the new problematic system. By changing the logic via the return codes it should be easy to ensure where the issue is: DXE or SMM driver. Other than it may even help us to understand whether the SMI handler exists at all.

 

If it is SMM, I would probably try replacing NvramSmm with NvramSmm & NvramSmi from some Z370 BIOS first and reflashing the firmware. Then… perhaps reverse-engineer/reimplement NvramSmm with the new changes and try to debug it too.

 

If you like the idea, I can share APTIO src and let you proceed.

Edited by vit9696

Share this post


Link to post
Share on other sites
Advertisement

Hi vit9696,

 

Yes, an uneasy situation indeed...

 

Meanwhile I made some test, with a rather strange result.

I set a simple override for SetVariable, something like:

OvrSetVariable(
        IN CHAR16                       *VariableName,
        IN EFI_GUID                     *VendorGuid,
        IN UINT32                       Attributes,
        IN UINTN                        DataSize,
        IN VOID                         *Data
)
{
	EFI_GUID gEfiAppleBootGuid = {0x7C436110, 0xAB2A, 0x4BBB, {0xA8, 0x80, 0xFE, 0x41, 0x99, 0x5C, 0x9F, 0x82}};
	return gOrgRS.SetVariable(L"TestVar", &gEfiAppleBootGuid, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | EFI_VARIABLE_RUNTIME_ACCESS, 4, "1234");
}

... which from some reason didn't panic on restart (obviously the "TestVar" was updated first by Clover's call to SetVariable, so I don't think it actually tried to write anything to nvram on restart). 

 

But then I changed only the preset apple boot guid to:

return gOrgRS.SetVariable(L"TestVar", VendorGuid, EFI_VARIABLE_NON_VOLATILE | EFI_VARIABLE_BOOTSERVICE_ACCESS | EFI_VARIABLE_RUNTIME_ACCESS, 4, "1234");

... which resulted again in panic on restart. Can't really explain why. Perhaps you have an idea.

EDIT: Maybe it panics only when it actually has data to change? Otherwise it probably just returns EFI_SUCCESS and exists.

 

Regarding your suggestion, it sounds like a good plan, but it's a bit too big for me at the moment, considering my limited experience with runtime EFI drivers combined with the limited free time that I currently have. But if someone else is up for this task, I'll be more than willing to test it on the Z390.

 

Edited by Pene

Share this post


Link to post
Share on other sites
On 10/28/2018 at 1:54 PM, vit9696 said:

They still allocate the SMM communication buffer as EfiRuntimeServicesData, and still pass its address via NvramMailbox NVRAM variable, so it should be guarded by AptioMemoryFix. 

By the way, no chance it is not guarded? How can we check this?

Share this post


Link to post
Share on other sites

Hey DF! How are you? Good to see you around :)

Yes, I know it should. But I was more referring to a situation in which from some reason guarding doesn't work on the newer Aptio.

Share this post


Link to post
Share on other sites
Quote

EDIT: Maybe it panics only when it actually has data to change? Otherwise it probably just returns EFI_SUCCESS and exists.

Could be, I guess.

Quote

Yes, I know it should. But I was more referring to a situation in which from some reason guarding doesn't work on the newer Aptio.

It is very hard to believe, to be honest. You can probably ensure it by asserting that NumEntriesLeft is 0 by the end of this function:

https://github.com/acidanthera/AptioFixPkg/blob/6a21a30d090721ff4620dd22bb91d4dca93e8db4/Platform/AptioMemoryFix/RtShims.c#L302

 

Share this post


Link to post
Share on other sites
21 hours ago, vit9696 said:

It is very hard to believe, to be honest. You can probably ensure it by asserting that NumEntriesLeft is 0 by the end of this function:

https://github.com/acidanthera/AptioFixPkg/blob/6a21a30d090721ff4620dd22bb91d4dca93e8db4/Platform/AptioMemoryFix/RtShims.c#L302

Thanks, as you expected, it is 0.

What puzzles me, is that if it indeed is a DXE/SMM issue, shouldn't it have affected Windows/Linux nvram writes as well? That's why I thought it should be something specific to the mac-specific reallocation.

 

Just for reference, here is the memory map from my system:

Spoiler

As outputted from shell: memmap.txt

And by AptioMemoryFix:

IMG-6393.thumb.jpg.b154fe9b40677a7f405ac4568f42839a.jpgIMG-6395.thumb.jpg.e825de98103c5e301de637df0b9b86ca.jpgIMG-6397.thumb.jpg.5156a238bee566728419e1d59dcf82fa.jpg

 

 

 

Edited by Pene

Share this post


Link to post
Share on other sites

Hi

 

Sorry for my beginners question: where should install cleannvram.efi in the EFI folder: Tools or drivers64UEFI?

 

Thanks

Share this post


Link to post
Share on other sites
1 hour ago, Matgen84 said:

Hi

 

Sorry for my beginners question: where should install cleannvram.efi in the EFI folder: Tools or drivers64UEFI?

 

Thanks

Tools. Drivers from drivers64UEFI started automatically I wonder if you want it.

Share this post


Link to post
Share on other sites

Hi, I want to use AptioMemoryFix with native nvram on my GA-Z390 Aorus Master/i9-9900k, but I found a strange problem, once I install this driver into clover/drivers64UEFI and remove any aptiofix*/emunvram driver, my mac will halt about 30-60 seconds on every reboot/shutdown, 

 

so I start in verbose mode, and when reboot, I got this "TLB invalidation IPI timeout" panic

 

no matter what smc I'm using(fakesmc/virtualsmc), no matter how I change the clover dsdt patches, it always give this panic on every reboot/shutdown.

 

when I switch to OsxAptioFixDrv-64.efi, this panic is gone.

 

so what's wrong?

屏幕快照 2018-12-17 上午10.56.27.png

Share this post


Link to post
Share on other sites

I think I know why AptioMemoryFix is causing this problem and AptioFix1 and EmuVar fixes the issue. Because the shims/setvariable runtime are located in physical memory that is not the kernelspace (since they are prevented from being relocated by boot.efi) and from Pene's panic you can see the vm map is being destroyed. This panic is being caused by the translation lookaside buffer (TLB), which is responsible for the virtual to physical address translation, waiting to be invalidated on all cpus. Which means that the shims/setvariable runtime no longer have a valid virtual address translation because they are gone from the memory map and TLB cache. This is a problem since these panics are happening not on cpu 0, where any calls to the EFI runtime should be taking place, that cpu is probably stuck in a loop trying to hit the TLB cache as you can see happens from the pmap_flush_tlbs function which is probably called when a miss occurs (or another function because there is a page fault, i didn't look close enough to what code actually is invoked when this happens) but this probably takes much longer than the timeout deadline or the vm map is locked preventing the lookup/page fault from even attempting to figure out which page information to cache in the TLB.

 

EDIT: I believe this would happen with AptioFix2 as well without EmuVar. However, this will probably result is a page fault for trying to write in write protected memory instead.

EDIT2: AptioFix2 and EmuVar may give the same panic though since the native nvram would not be used and there would be no page fault for write protection.

EDIT3: Also, yes, setvariable is only called at shutdown/reboot when the information has changed. The man page for nvram says:

Changes to NVRAM variables are only saved by clean restart or shutdown.

EDIT4: Pene's other panics provide much more information that lead me to believe that this is indeed the case, as the second image appears to show the problem pretty well the interrupt fails, also the third image has a pretty good message of "no mapping exists for frame pointer".

EDIT5: EFI runtime doesn't have to be called from cpu 0 if the os has a locking mechanism to prevent multiple accesses at once, which it must since it's called from different cpus in each of the panics.

Edited by apianti

Share this post


Link to post
Share on other sites

While this theory does sound interesting, I am afraid it is not correct. UEFI Memory map does include both shims and and other areas, so it is not correct to assume that the virtual address is unmapped, as it is reserved by XNU.

 

The panic literally means that one of the cores stalled within UEFI runtime, and while the rest of the system reached userspace unmapping, the UEFI runtime thread still did not finish its job.

 

Most likely it is infinitely waiting for SMI.

Share this post


Link to post
Share on other sites

well, with my board, it's not possible to use AptioFix2/3, they always failed to start, but with AptioMemoryFix, I only got this panic when I try to reboot/shutdown mac. but it boots fine.

 

so, is there possible solution that I can help test?

Share this post


Link to post
Share on other sites
5 hours ago, vit9696 said:

While this theory does sound interesting, I am afraid it is not correct. UEFI Memory map does include both shims and and other areas, so it is not correct to assume that the virtual address is unmapped, as it is reserved by XNU.

 

The panic literally means that one of the cores stalled within UEFI runtime, and while the rest of the system reached userspace unmapping, the UEFI runtime thread still did not finish its job.

 

Most likely it is infinitely waiting for SMI.

 

Did you read the functions that Pene's panic backtrace? Because I went through them and the place where it is panicking is exactly because it has destroyed the vm map and is flushing the TLB, waiting on the other cpus to also flush their TLB. This is a race condition and why there are multiple panics. If it's in an SMI, then there would be no possibility for this panic to happen because an SMI prevents any other core from running not in SMM:

Operations in SMM take CPU time away from the applications, operating system kernel and hypervisor, with the effects magnified for multicore processors since each SMI causes all cores to switch modes.

Therefore it would have to either have returned, never entered, or the panic would be that there was a timeout for the SMI and would have the text "NMIPI for unresponsive processor: interrupt watchdog for vector ...".

Edited by apianti

Share this post


Link to post
Share on other sites
4 hours ago, steve3d said:

well, with my board, it's not possible to use AptioFix2/3, they always failed to start, but with AptioMemoryFix, I only got this panic when I try to reboot/shutdown mac. but it boots fine.

 

so, is there possible solution that I can help test?

 

Hi

 

I found a guide for you amazing config GA-Z390 Aorus Master/i9-9900k via reddit. Waiting a solution with Aptiomemoryfix. 

 

https://github.com/cmer/gigabyte-z390-aorus-master-hackintosh

Edited by Matgen84

Share this post


Link to post
Share on other sites
9 minutes ago, Matgen84 said:

 

Hi

 

I found a guide for you amazing config GA-Z390 Aorus Master/i9-9900k via reddit. Waiting a solution with Aptiomemoryfix. 

 

https://github.com/cmer/gigabyte-z390-aorus-master-hackintosh

 

Ok, well this guide shows that AptioFix2-free2000 (such a bad driver...) and EmuVar work which means you can see if this panic happens still or if it is related to only the prevention of the relocation of runtime code regions.

Share this post


Link to post
Share on other sites

well, unfortunately, I've read about this guide somewhere, and my first config is almost 90% similar to this. but my situation is a little different here, because this board only have one hdmi 1.4 output, so my 4k monitor is useless with igpu, I have to use my old nv 970 card as dgpu, so that's why I'm still stuck with 10.13.6.

 

and with these setting, the AptioFix2-free2000 driver also gives me a allocation error, and can not boot at all.

 

but all the config.plist so far about this ga-z390 aorus master as one common big mistake, which is that this board comes with a CNVi wifi/bluetooth. the bluetooth works only if you boot into windows first, then back to mac, then you will find out that the bluetooth works. of course without instant hotspot and continuexxx(I have chinese version, so I forgot how to spell this correctly). so this cnvi bluetooth problem is that when cold boot into mac, there is no way to upload the firmware.

 

 

so this random hangup only happens when I use AptioMemoryFix, with or without nvramemu, it always hangs when reboot/shutdown.

 

the core cause this TLB error is totally random.

 

anyway, thank you Matgen84 and apianti. :)

 

I can use this bluetooth to pair with headphone, iphone, android. So I'm very positive about this bluetooth works. why need a 3rd 4.2/4.1/4.0 bluetooth while you can have a bluetooth 5.0?

Share this post


Link to post
Share on other sites

Peripherals → USB Configuration → XHCI Hand-off : Enabled

Peripherals → USB Configuration → Legacy USB Support : Auto

 

and another strange thing is, almost every guide says I need this XHCI hand-off set to enable. but, now I'm using the setting disabled. and this legacy usb support, I also set it to disabled. nothing happens. and yesterday, I found no matter what I set these two bios setting, EVERYTHING USB port works. 

 

in case you might be intreseted, I'm attaching my config.plist and bootlog.txt here. I don't have any ssdt/dsdt patch, because I don't know how and what to patch.

bootlog.txt

 

config.plist

Edited by steve3d

Share this post


Link to post
Share on other sites

@apianti, well, obviously, it is not (constantly) in SMM, just constantly busy-loops in 0 ring within SmmCommProtocol->Communicate (you may check its src). Should have been slightly more clear. The functions are pretty much irrelevant, though I am aware of them. Unmapping of UEFI Runtime cannot happen.

 

For all the others: you have no other choice but AptioMemoryFix + EmuVariableEfi. Other insanity will doom you.

Edited by vit9696

Share this post


Link to post
Share on other sites

In this case, as you say it's not constantly in SMM, even in an infinite loop, a NMI still interrupts (even another interrupt since these are generated in hardware by the APIC), meaning that it could still be interrupted by a page fault or the TLB invalidation interrupt. The kernel and its interrupts operate in ring 0 so this has nothing to do with anything or there would be a general protection fault for any interrupt during the EFI runtime code, which is shown not to be true by the panic backtrace. This is needed since the code an interrupt may use may also not be mapped and throws a page fault or similar fault, it is by design. You are wrong about unmapping of UEFI regions look at hibernate_newruntime_map, it removes the mapping for previous UEFI regions from the kernel map and remaps new ones. This is the same map that is destroyed by vm_map_destroy, all mappings are removed except the kernel. Read the kernel source, specifically those functions that are specified in the panic backtrace. The second panic even shows that there is a call to pal_efi_call_in_64bit_mode which is the wrapper to the call to the efi function that is mysteriously not in the call stack, which is being interrupted, and that gets interrupted by a local APIC interrupt, then another interrupt and ends in the panic of TLB flush timeout. You can search through the kernel source and see that if the problem was anything else other than there being no mapping for the virtual address in the TLB there would be a different outcome.

Share this post


Link to post
Share on other sites

I have another old computer which has z97 board and old xeon v3 cpu. I've tried with the same driver64uefi rig, this computer don't have the panic problem.

 

is it any possible of the uefi bios? I see gagabyte released a new bios a few days ago, I will try it and report back. 

and if this happens again, how can I get this panic log in text? 

 

I only posted about half of the panic log. another half can not be seen clearly with my iphone's normal video recording.

 

use a slow montion video recording?

Edited by steve3d

Share this post


Link to post
Share on other sites

About to update one of my systems from z370 to z390. What is the current best practice for Aptio and Z390 pending hopeful fixes to AptioMemortFix?

Thanks,

g\

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×