Lion kernel testing on AMD (don't ask help here: use the Help Topic)
Started by ham4ever, Dec 16 2012 11:10 PM
552 replies to this topic
#81
Posted 27 December 2012 - 09:05 AM
Like I said: I don't think it is the kernel...
It's weird...
It's weird...
#82
Posted 27 December 2012 - 10:06 AM
Hi, Andy!
I'm also convinced it's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land.
The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us.
Thank you for the time you're investing on it.
Hey, Delta! Your debug version is very promising. I think it can be made even more accurate. Say, you done something like this:
{
printf("exec_add_user_string() started\n");
int error = 0;
I think it's cool to know when each function starts, but it would be even better if we know which value they return or which task they actually do, or which results from each statement, something like that:
return ERROR;
printf("the xxxxxx function returned the value \n", ERROR);
}
By the way, lots of good info already from the debug version you already made. Notice this:
goto bad_notrans; - 1
goto bad_notrans; - 2
exec_check_permissions() started
pal_kernel_announce() started
goto bad; - 1
calling mountroot_post_hook
calling mountroot_post_hook (again)
bsd_init() done?
goto bad; - 2
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
in the for loop now...
exec_fat_imgact() started
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
exec_add_user_string() started
exec_apple_strings() started
exec_add_user_string() started
exec_add_user_string() started
Setting security token
goto again; - 1
bad:proc_transend(p, 0);
bad_notrans: returning error
check_for_signature() started
skipping KERN_FAILURE
proc_lock(p)
proc_unlock(p)
switch_protect
Err: 0
end of bsdinit_task()?
I would do a version myself with the suggestions i made, but my Xcode stopped working on a sudden, so i'll have to reinstall everything here.
Thank you all guys for your effort!
I'm also convinced it's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land.
The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us.
Thank you for the time you're investing on it.
Hey, Delta! Your debug version is very promising. I think it can be made even more accurate. Say, you done something like this:
{
printf("exec_add_user_string() started\n");
int error = 0;
I think it's cool to know when each function starts, but it would be even better if we know which value they return or which task they actually do, or which results from each statement, something like that:
return ERROR;
printf("the xxxxxx function returned the value \n", ERROR);
}
By the way, lots of good info already from the debug version you already made. Notice this:
goto bad_notrans; - 1
goto bad_notrans; - 2
exec_check_permissions() started
pal_kernel_announce() started
goto bad; - 1
calling mountroot_post_hook
calling mountroot_post_hook (again)
bsd_init() done?
goto bad; - 2
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
in the for loop now...
exec_fat_imgact() started
goto bad; - 3
in the for loop now...
exec_mach_imgact() started
exec_add_user_string() started
exec_apple_strings() started
exec_add_user_string() started
exec_add_user_string() started
Setting security token
goto again; - 1
bad:proc_transend(p, 0);
bad_notrans: returning error
check_for_signature() started
skipping KERN_FAILURE
proc_lock(p)
proc_unlock(p)
switch_protect
Err: 0
end of bsdinit_task()?
I would do a version myself with the suggestions i made, but my Xcode stopped working on a sudden, so i'll have to reinstall everything here.
Thank you all guys for your effort!
#83
Posted 27 December 2012 - 10:34 AM
theconnactic, on 27 December 2012 - 10:06 AM, said:
Hi, people!
It's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land. The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us.
It's not the kernel itself, it boots just fine. It's just something missing in the kernel that prevents the userland processes to spawn in 64bit mode on AMD machines. Used to think it was a ssse3-related issue, thanks to an old paper written by David Elliott (dfe), but we have ssse3 emulation now, so what? I still think it's a CPUID issue elsewhere in the kernel that's preventing us to load the user land. The obvious thing is to investigate kernel_exec.c and mach_loader.c (and h), but there's no reason the CPUID issue cannot occur elsewhere and prevent the user land to run, even if the kernel boots fine. That was the issue Sinetek dealt with to make his 64-bit Snow Leopard kernel a winner, and he couldn't repeat his success with Lion, just like us.
Yes, that is our problem. However, I think I've investigated the whole kern_exec.c, and it looks like it runs just fine.
I'll take look @ mach_loader.c later.
#84
Posted 27 December 2012 - 10:44 AM
Delta, i edited my post: take time to read it before doing anything with mach_loader, if you can.
I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows?
I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows?
#85
Posted 27 December 2012 - 10:55 AM
theconnactic, on 27 December 2012 - 10:44 AM, said:
Delta, i edited my post: take time to read it before doing anything with mach_loader, if you can.
I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows?
I think kern_exec.c is not running as it should: it's returning errors (the "bad" function) where it should not. I think we should perhaps investigate why it's acting like that and correct the issues. Only after that, we should focus on another file. Or maybe solving these issues takes us necessarily to mach_loader.c or other file, who knows?
Thanks for the great idea! I'll add return values and remove some unnecessary info...
Will post another diff & kernel soon!
EDIT: And btw, for example:
goto bad_notrans; - 1
means we got PAST goto bad_notrans; (first of them)
I should have made the messages a bit more clear...
EDIT2: lion-test-21 compiled: http://www.solidfile...m/d/fcf9be63ed/
Diff coming soon...
Diff: http://www.solidfile...m/d/ce042eba5a/
#86
Posted 27 December 2012 - 11:25 AM
Deltac0, on 27 December 2012 - 10:55 AM, said:
means we got PAST goto bad_notrans; (first of them) 
Delta, don't you see? These "bad" functions aren't to be accessed at all! If we're getting past them, it means the errors that justify them are happening. They should've been skipped altogether. Yet, take a look at the code, the "bad" function won't hang all processes at the scene of the crime: its output, though, can be perhaps prevent some important process to run later.
About the newest debug kernel, i'm going to test it now.
#87
Posted 27 December 2012 - 11:30 AM
theconnactic, on 27 December 2012 - 11:25 AM, said:
Delta, don't you see? These "bad" functions aren't to be accessed at all! If we're getting past them, it means the errors that justify them are happening. They should've been skipped altogether. Yet, take a look at the code, the "bad" function won't hang all processes at the scene of the crime: its output, though, can be perhaps prevent some important process to run later.
About the newest debug kernel, i'm going to test it now.
About the newest debug kernel, i'm going to test it now.
Ahh, now I get it... xD
It goes to bad and bad_notrans at some point... Needs more debugging.
#88
Posted 27 December 2012 - 11:30 AM
P.S.: No, i'm not suggesting us to artificially skip them or remove them from the source. Instead, they communicate us about issues that are happening, so we better take a look at them and fix them, and hopefully that will get us one step further. My bad my Xcode is screwed.
#89
Posted 27 December 2012 - 11:31 AM
theconnactic, on 27 December 2012 - 11:30 AM, said:
P.S.: No, i'm not suggesting us to artificially skip them or remove them from the source. Instead, they communicate us about issues that are happening, so we better take a look at them and fix them, and hopefully that will get us one step further. My bad my Xcode is screwed.
Yea, it's bad to just skip them... Like we tried with the EACCES error... However, I can't even get that far anymore...
EDIT: Too bad that "bad" doesn't have any arguments, the code just skips to it somewhere... I added some messages to find out the exact point like this:
if (--iterlimit == 0) {
printf("Going to bad (4)\n");
error = EBADEXEC;
goto bad;
}
lion-test-22: http://www.solidfile...m/d/e5a4e695a6/
#90
Posted 27 December 2012 - 11:43 AM
Even more important would be knowing if and where else the outputs of the "bad" functions are used. Are the "bad" functions being called somewhere else?
#91
Posted 27 December 2012 - 11:49 AM
theconnactic, on 27 December 2012 - 11:43 AM, said:
Even more important would be knowing if and where else the outputs of the "bad" functions are used. Are the "bad" functions being called somewhere else?
I think the "bad" functions are just like functions inside another function. Like if the "main" function does something wrong -> the code skips to "bad" part of the function.
The "bad" function I'm trying to figure out is located in kern_exec.c -> load_init_program() (the function that calls launchd).
I added those messages to all (gotta do a double check) "goto bad;" parts, but still it goes to bad, without giving me any of those "going to bad (x)" messages, so it must be called from outside?
This is damn weird...
EDIT: I'm sorry, I meant the exec_activate_image() function...
EDIT2: And the return of "bad" function is just like the return of it's main function? That's how I understand it.
EDIT3: I gotta go now, I'll be back in few hours.
#92
Posted 27 December 2012 - 12:46 PM
Thank you, Delta!
Andy, any ideas how much relevant this bad function could be? I'm looking at the source and found it nowhere but in kernel_exec.c. Perhaps the search tool here is malfunctioning...?
Maybe the main function returns the value of Bad when certain conditions are not met. So when the main function is called elsewhere, it will give the value of bad and perhaps this would hang the processes.
Andy, any ideas how much relevant this bad function could be? I'm looking at the source and found it nowhere but in kernel_exec.c. Perhaps the search tool here is malfunctioning...?
Deltac0, on 27 December 2012 - 11:49 AM, said:
EDIT2: And the return of "bad" function is just like the return of it's main function? That's how I understand it.
Maybe the main function returns the value of Bad when certain conditions are not met. So when the main function is called elsewhere, it will give the value of bad and perhaps this would hang the processes.
#93
Posted 27 December 2012 - 01:46 PM
theconnactic, on 27 December 2012 - 12:46 PM, said:
Maybe the main function returns the value of Bad when certain conditions are not met. So when the main function is called elsewhere, it will give the value of bad and perhaps this would hang the processes.
Exactly what I was thinking. It must be done this way...
If we just could build verbose launchd? Or something to see if the code even tries to run it?
Okay, new kernel. This one has more specific debug messages about those "bad" functions like this:
bad:
printf("We are in bad of exec_mach_imgact()\n");
return(error);
}
lion-test-23: http://www.solidfile...m/d/9d673a70ae/
EDIT: How is this possible? The kernel seems to execute most (if not all) of the "bad" functions... Still needs some more work.
EDIT2: Meklort shared his wisdom in IRC... Bad functions will be executed. The problem is somewhere else... Or something.
#94
Posted 27 December 2012 - 04:42 PM
We have some kind of progress, maybe...
If you're running AMD SL / Lion (or secretly even ML):
1. Download this: http://www.solidfile...m/d/428fa4efbc/
2. sudo su in terminal
3. chmod +x tiny
4. ./tiny
5. Post here what happened.
If you're running AMD SL / Lion (or secretly even ML):
1. Download this: http://www.solidfile...m/d/428fa4efbc/
2. sudo su in terminal
3. chmod +x tiny
4. ./tiny
5. Post here what happened.
#95
Posted 27 December 2012 - 05:54 PM
Deltac0, on 27 December 2012 - 04:42 PM, said:
We have some kind of progress, maybe...
If you're running AMD SL / Lion (or secretly even ML):
1. Download this: http://www.solidfile...m/d/428fa4efbc/
2. sudo su in terminal
3. chmod +x tiny
4. ./tiny
5. Post here what happened.
If you're running AMD SL / Lion (or secretly even ML):
1. Download this: http://www.solidfile...m/d/428fa4efbc/
2. sudo su in terminal
3. chmod +x tiny
4. ./tiny
5. Post here what happened.
#96
Posted 27 December 2012 - 06:11 PM
Hi, Andy!
It creates a mach-o static executable (that is, does not use dyld).
We intend to replace launchd with it, to see what's the effect.
This binary executable must be also able to run on an AMD machine, otherwise the experiment is DOA.
Best regards.
It creates a mach-o static executable (that is, does not use dyld).
We intend to replace launchd with it, to see what's the effect.
This binary executable must be also able to run on an AMD machine, otherwise the experiment is DOA.
Best regards.
#97
Posted 27 December 2012 - 06:13 PM
Andy Vandijck, on 27 December 2012 - 05:54 PM, said:
What does this do?
It's just an ultra-small Mach-O executable:
http://osxbook.com/b...h-o-executable/
nicertiny.asm.
Meklort told us to test if kernel starts launchd with that. It doesn't need dyld, so it eliminates it out...
but I get illegal instruction when running the nicertiny on my AMD...
Changed /sbin/launchd to /tiny on the source, put tiny on root of the HDD and boot. I got panic!
#98
Posted 27 December 2012 - 06:16 PM
Good plan... then we can see if it is dyld
#99
Posted 27 December 2012 - 06:26 PM
Andy Vandijck, on 27 December 2012 - 06:16 PM, said:
Good plan... then we can see if it is dyld 
but we both get "Illegal instruction" when trying to run the nicertiny...
I tried to boot with it, panic... Most likely somehow related to the illegal instruction when ran from terminal.
But now we know that the kernel DOES start the launchd.
Probably something about dyld.
#100
Posted 27 December 2012 - 06:59 PM
64-bit kernel, Delta?
Maybe it's just the dyld indeed... that would be good news.
Maybe it's just the dyld indeed... that would be good news.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users



Sign In
Create Account









