sse3 -> sse: recompiler possible?

iblue · September 2, 2005

Hi,

I've got an Athlon XP 1800+ and so no way to run osx86.

My idea is writing a "recompiler", which searches the binaries for an SSE2,3-commands an replace them with other instruction which do the same. If using a opcode like "call 0xdeadbeef" (and perhaps additional NOPs) for replaceing instructions like "addsubpd" (they are both 3 bytes long, aren't they?), you wouldn't even need to modify any jump-addresses. The only problem is to load the code which is replacing the instrution to 0xdeadbeef for every programm.

Comments?

cyrana · September 2, 2005

This may be possile, but it'd be insanely slow. SSE3 is only used for a few things (mostly to try to lock out non-dev systems), and it isn't even a requirement on the development guidelines. I'd imagine a -lot- more SSE2 codes are used, and SSE2 is a LOT better than SSE.

Kryton · September 3, 2005

This may be possile, but it'd be insanely slow. SSE3 is only used for a few things (mostly to try to lock out non-dev systems), and it isn't even a requirement on the development guidelines. I'd imagine a -lot- more SSE2 codes are used, and SSE2 is a LOT better than SSE.

<{POST_SNAPBACK}>

I've been thinking about this. It could be done by writing a small TSR boot-loader that loads before the actual OS does and places itself at the upper realm of conventional memory. You could then trap invalid-opcodes by setting up an exception handler so that when the OS calls them and replace them with similar code.

It would be tricky to do because:

- You must use BIOS calls only

- You have a performance hit by using exceptions

- You need to be careful with register usage (MMX can be used in parallel with SSE so some registers may be used)

This isn't to say it cannot be done and if anyone is interested in trying it I'd welcome a PM from them. I have a bit of documentation on the subject and would be game to help in anyway possible.

Kryton

ardosdev · September 4, 2005

I've been thinking about this. It could be done by writing a small TSR boot-loader that loads before the actual OS does and places itself at the upper realm of conventional memory. You could then trap invalid-opcodes by setting up an exception handler so that when the OS calls them and replace them with similar code.

It would be tricky to do because:

- You must use BIOS calls only

- You have a performance hit by using exceptions

- You need to be careful with register usage (MMX can be used in parallel with SSE so some registers may be used)

This isn't to say it cannot be done and if anyone is interested in trying it I'd welcome a PM from them. I have a bit of documentation on the subject and would be game to help in anyway possible.

Kryton

<{POST_SNAPBACK}>

I'm all for a patch like this; even if it means running OSx86 slowly, it's still better than not at all.

Another idea may be an SSE2 emulation layer.

I might be able to help; I know assembly language, and can get documentation on these instruction sets.

Kryton · September 4, 2005

I'm all for a patch like this; even if it means running OSx86 slowly, it's still better than not at all.

Another idea may be an SSE2 emulation layer.

I might be able to help; I know assembly language, and can get documentation on these instruction sets.

<{POST_SNAPBACK}>

I think a lot could be ripped from Bochs (provided the result is GPL this is good) so an entire SSE/SSE2 implementation does not need writing.

What would need writing is:

- Small floppy boot loader (for now) that can load a micro-kernel

- Micro-kernel loads to the top of conventional memory and alters BIOS conventional memory settings so it is invisible to the OS

- Micro-kernel sets up an INT 6 interrupt handler to catch Invalid Opcode instructions

- When the interrupt handler is called it uses the Bochs code to translate the SSE instruction to vanilla-x86 (or 3dnow! etc.) and executes it. The opcode address is placed on the top of the stack by the processor so can easily be read using standard Bochs decoder code.

- Once this is all finished the handler resumes execution at the next instruction

- Oh and we must start the OS by processing the MBR.

This stuff is all off-the-shelf ASM and can be grabbed in most places. The loader is basically a mini-OS and similarly the interrupt handlers are common place (though they generally are keyboard ones the theory is the same). Also, Bochs contains a lot of code that could be re-used in any implementation.

I am game to do this but I don't have much low-level ASM experience so it seems a bit of a mammoth challenge to me.

Problems are:

- Writing it is quite a nasty low-level task (as I've mentioned) due to the reliance on BIOS calls only.

- The OS may try to overwrite the INT 6 interrupt handler with its own to provide debugging/stability in the OS. This needs to be averted somehow at the lowest level as once we have entered any high level ring's it cannot be overwritten. (virus code is probably the place to look for dealing with this issue, the older DOS ones used to have nasties to handle with this problem)

iblue · September 5, 2005

Ok, ToDo:

-FInd out what all the SSE2,3-Instructions do.

--Get code "out of the Bochs" ( ) to emulate SSE2,3

-Find out how to write an exeption handler on x86

-Find out how to chain-load the darwin-kernel

--replace the bootsector with our exeptionhandler and move the original bootsector to another place

---Problem: Where is free space?

--Use boot-floppy

---Prolem: Need time-travel device to get a floppy

----Solution: Use bootable CD-R

Isn't there a SSE3 on SSE2 emulator? Could we use it for emulationg SSE2?

Any suggestions?

cyrana · September 5, 2005

I really think this is a waste of time. But, if someone wants too, they should feel free to. As I already said, it would be SO MUCH slower than SSE3 to SSE2. And there is a lot of SSE2 code in any x86 app (not so w/SSE3 code) due to Apple programming guidelines.

Qemu emulates SSE2 I think, at least you can run it on an SSE machine and then install OS X in it (uber slow tho).

blahsucks · September 5, 2005

That would be highly unreliable, though. If you screw something up, you could crash the entire system/overwrite memory/overheat the PC and things like that. You would need a test machine.

firebush05 · September 5, 2005

--Use boot-floppy

---Prolem: Need time-travel device to get a floppy

I've heard this crazy rumor that some PC's actually still use those on occasion! I know I know, it sounds a bit far fetched.. I mean in 2005, what kind of OS would ACTUALLY require a floppy these days!

Time-trave device is definitely in order.

iblue · September 5, 2005

@cyrana:

Why do you mean the Emulation would be slow? How many SSE-Instructions are used? 0,1% 1%

A little calculation for 1%: If we need 100 instrcutions instead of 1. 100 instructions would become 200.

On a 2 GHz maschine this would mean you have the same speed like 1GHz maschine without emulation. Not über-slow. And this is the worst case...

@blahsucks:

Bochs for debugging? Vmware for real testing?

Kryton · September 7, 2005

I really think this is a waste of time. But, if someone wants too, they should feel free to. As I already said, it would be SO MUCH slower than SSE3 to SSE2. And there is a lot of SSE2 code in any x86 app (not so w/SSE3 code) due to Apple programming guidelines.

Qemu emulates SSE2 I think, at least you can run it on an SSE machine and then install OS X in it (uber slow tho).

<{POST_SNAPBACK}>

QEmu is slow because it also emulates every other instruction so instead of just doing SSE like we would be doing it does every mov, eax, add, sub, mul etc. multiplying the raw-instructions by probably a factor of at least 3. As we are emulating SSE instructions (which should be relatively few in total) though there may be a performance hit it will not be major (another idea thinking of this, could we get Bochs to translate the code from SSE -> non-SSE?)

That would be highly unreliable, though. If you screw something up, you could crash the entire system/overwrite memory/overheat the PC and things like that. You would need a test machine.

<{POST_SNAPBACK}>

Yes it is unreliable in the sense that low-level code is required but no more than is in OSX already to handle similar things. As someone above mentioned as VMWare allows certain code to run directly on the CPU we can easily use it to test (it has internally an emulated virtual BIOS and ring 0 to deal with issues of running in ring 5 on Windows) early runs. And this isn't that "dangerous", we are merely setting up an exception handler and calling some equivalent code - not fiddling with OS disk routines or messing with its interrupts (well aside INT6).

crumpo · September 9, 2005

here is a link for an mmx emulator:

http://www-sop.inria.fr/geometrica/team/Sy.../progs/mmx-emu/

maybe this has some useful bits for the skilled ones of you

crumpo

Kryton · September 10, 2005

Okay, the MMX thing runs under Linux so should be possible to port it to Darwin easily.

I am wondering if we need the low-level ASM stuff seeing this? Perhaps it is possible to do as this does and execute any program (from within Darwin) through a "translator"?

darkhooda · September 10, 2005

Since both Bochs and Qemu are GPLed, theoretically you can just remove the emulation code for everything, EXCEPT the SSE2 and possibly SSE3 emulation, and let those code be run directly on the processor. This had been suggested before, but since I don't have much coding experience outside of Python and Perl.

Goodwu · September 19, 2005

here is a link for an mmx emulator:
http://www-sop.inria.fr/geometrica/team/Sy.../progs/mmx-emu/

maybe this has some useful bits for the skilled ones of you

crumpo

I think this emulator is very useful.

We can expand the support of instructions to sse2 and sse3.

And I think there may have two ways to use this emulator:

1. Replace original OSX apps that have sse2 or sse3 instructions with small app loader, move original apps to another place, the small app loader will setup the SIGILL catcher and automatically find the original apps and load them.

2. Modify boot scripts(if there are some) and/or boot apps to export the LD_PRELOAD environment variable to let the SIGILL catcher library to load automatically.

bonehead · September 19, 2005

I'm all for a patch like this; even if it means running OSx86 slowly, it's still better than not at all.

Another idea may be an SSE2 emulation layer.

I might be able to help; I know assembly language, and can get documentation on these instruction sets.

The problem is, I think you'll have trouble trapping some of the SSE2 instructions. IIRC, many of the "new" SSE2 instructions are extensions to MMX and share these MMX opcodes. These instructions very well may not trigger an illegal instruction exception.

If this is the case, to emulate this you would need to first scrub any code to be executed for SSE2 instructions that can not be trapped, and patch them at run time, then mark the block as executable... much the same way VMware scrubs code for non virtualizable instructions (some x86 instructions don't play well with virtualization) before allowing code execution. Needless to say, this is not a trivial task.

Options?

- Try to see if it's possible to patch Vmware’s scrub routine to also look for SSE2 instructions.

- Try to convince Fabrice Bellard to release the source to his QEMU Accelerator Module for QEMU, and add SSE3 emulation to it.

- Try to patch Darwin to scrub code before execution.

- Test trapping SSE2 instructions on an SSE cpu and hope that I'm wrong.

Goodwu · September 22, 2005

- Try to convince Fabrice Bellard to release the source to his QEMU Accelerator Module for QEMU, and add SSE3 emulation to it.

There's a open source QEMU accelerator module named qvm86.

And it is also possible that we process the SSE2 and SSE3 instructions before sending them to the accelerator.

So this is not a problem.

bonehead · September 22, 2005

There's a open source QEMU accelerator module named qvm86.
And it is also possible that we process the SSE2 and SSE3 instructions before sending them to the accelerator.

So this is not a problem.

qvm86 development seems to have stagnated, and isn't up to the speed or compatibility of Fabrice's.

Processing the instructions before "sending" to them to the accelerator may be more complicated and performance hindering than you think.

EDIT:

qvm86 uses virtualization techniques for running ring3 (apllication/user-mode
unprivileged) guest code. This basically means using the CPU MMU and

protection mechanisms to run the the guest code unmodified.

Because the x86 CPU wasn't designed to be virtualized this isn't possible[1]

for privileged kernel code, so we use the normal qemu dynamic translation

emulate that code.

Paul

[1] Some projects do virtualize kernel code, but this requires either

modifying the guest code before it is executed (VMware), or using specially

modified guest kernels (Xen).

Unfortunately, what this says is that qvm86, and likely Fabrice's accelerator, will not benefit here since there is no actual code modification. "run the guest code unmodified" is the key.

To scrub for SSE2 instructions at run time will not be an easy task, without sacrificing a great deal of speed, and will likely eliminate the possibility of using qvm86.

wiebeest · September 22, 2005

At the WDC2005 Steve Jobs stated that the rumours were true and Apple had secretly developed a x86 version for each of their versions of os X.

What the pre SSE2/SSE3 CPU owners would really need is to have someone leak one of the earlier versions of osXx86, before Tiger (Jaguar, Panther).

Think about it: Jobs stated that Apple developped a x86 shadow version of each version.

Panther was released before SSE2 was mainstream in x86 platform land.

If the switch from IBM PPC to Intel x86 was decided not too long ago, Apple must have held the possibility of AMD open (which didn't support SSE2 until the AMD64 systems).

So if apple has made x86 copies of pre-Tiger too, they must have been non-SSE2/SSE3 depending versions.

And since Tiger in it's essence is not very much more than Panther with widgets & desktop search, that would be a great alternative wouldn't be?

Problem is how to get it? :rolleyes:

Ehhmm...maybe Apple would be so kind to freely spread the earlier versions of OSX (x86) just like they do with os System 7.5 and such...

Maybe they actually will...in 2020 or something :unsure:

bonehead · September 22, 2005

Think about it: Jobs stated that Apple developped a x86 shadow version of each version.
Panther was released before SSE2 was mainstream in x86 platform land.

Yes, but then any of the newer applications compiled with SSE2/3 support (probably a great majority of them, since it'll be a given on OSX86, unlike Windows) wont work. Also, just as the move to the new 10.4.2 build breaks compatibility with the older builds, you can bet the compatibility of even older build are much worse. You may as well be running OS9.

Although it's fun to entertain the idea of how to do it, I think upgrading to a new CPU/MB is probably the best path. Most SSE only system are starting to show their age now anyway. A lower end Celeron D or Sempron with SSE3 will usually come close to or beat the highest end SSE only CPU (Barton 3200+?) in most things - and you can easily find a MB and CPU combo that'll let you re-use video, ram, etc.

blahsucks · September 25, 2005

A leak would probably be the only option. Emulation, without an accelelrator, will be slow enough that you would be better off with PearPC.

mikesown · October 5, 2005

I think that the IDEAL solution would be to have a customizable bootloader that acts as a miniature operating system with tiered options as such:

SSE2 mode

SSE mode

x86 mode

With those tiered options, anyone and I mean ANYONE with an intel based processer could run OSX, albeit probably slowly. The tiered options would allow the maximum performance, as someone with an SSE processor wouldn't have to run generic bytecode, while someone with a processor which didn't have any version of SSE on it could still run the operating system.

Such a program would certainly be tedious to do and slow, however it WOULD work.

thrunner · October 5, 2005

You can get a Intel Celeron D with SSE3 together with a Intel 915GL board with fully working Quartz Extreme/CoreImage compatible and network/audio working, all for about $125 US.

If they ever release OSX86 for generic PC, what do you think they would charge?

The point is hardware is so cheap nowadays that it just doesn't make sense to try to run a new OS with old hardware which will be barely compatible (even if SSE works, what about all the other parts?).

mifki · February 8, 2008

lol, seriously man, if you still have sse, go {censored} yourself then drown in a lake

Colonel · February 8, 2008

Wow, you just bumped a thread that's like 3 years old. A new record!

sse3 -> sse: recompiler possible?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites