So where did all this start? with the ToH patches?
Yes indeed, netkas released the "ToH" source patches for the 9.2 kernel on his blog sometime back in May (if i remember correctly) that gave us the kickstart into getting stock apple sources to work on machines they didnt officially support. Over time however we have rewritten almost all of it, so that now not much of the original toh sources remain.
My initial motivation was to get speedstep working properly on my pentium-M laptop. The speedstep patch, which paulicat wrote, was working fine on intel's core or higher CPUs, but it would make the system timer run slower on older CPUs like the pentium M, making it almost unusable. after I fixed this trivial issue (there was a bug in the timer scale calculation), I still had quite high temperatures on my laptop cpu.
Plus the cpu throttler kext written by Niall Douglas also had some deficiencies, so i started writing a new kext which would also allow undervolting the processor, and then came some more changes, and more and more.. At some point we realized the entire clocking code in xnu was based on too many assumptions, so after a lot of discussions we changed the realtime clock code so it would work properly on new and old cpus alike.
With so many changes already in, I thought it was better for the general OS X community to benefit from this kernel, and I made the first beta release. it just progressed on from there.
I get the impression this kernel is more of a team effort than previous iterations?
Indeed it is. when i made the speedstep kext, I put the entire source code online under a bsd license. this was picked up by Superhai, who made massive improvements to it, giving better support to most cpus. On the other hand, with the kernel, a lot of groundwork was done by Turbo with the rtclock fix, we also collaborated on writing a new sse3 emulator.
All of us believe that collaboration is the key: Not releasing sources, or keeping them closely guarded hampers the progress.
The results seem to bear that out.
absolutely. and so you see within the short span of 2 months we are close to solving most of the problems that had been plaguing the community. People can finally get speedstep and sleep working together, the kernel is more secure with NX bit enabled on supported system, and the sse3 emulator is totally rewritten to be multithread capable, and 3 times faster, all this because we offered our time and shared our know-how.
We'll release the full sources as a patchset (nicely commented and tagged with explanations and original contributor's names) as soon as the final kernel is released, and hopefully we'll see many new developers join us in improving this great kernel
I'd like to mention that, a few months ago I was a total newbie to xnu. I was unable to compile the kernel
But with the help of all those in irc and on IM, it was not too difficult to get started. There are still a lot of kinks, for example, compiling xnu is very messy if your system is not setup already
Apple's build environment seems to be setup in a very different way than traditional gnu/linux environments, despite being "unix"-like. Once again, thanks to great work done by Dense, we have a script which automates getting the sources, patching them, setting up the build environment and compiling the entire kernel -- all in one command.
We decided it is very important to make it easy for new developers to join in, so we have the voodoo build script, as well as nicely commented out source with explanations of each change we made compared to stock sources. Much of it is also because we want to shed the 'hackintosh' image and look at ourselves as a proper open-source project.
This seems to be a growing trend within darwin/os x?
It's important to become 'friends' with apple so the flow of information can happen in both directions, this will help position darwin as a viable alternative among the various flavors of unix, thus benefitting apple, and apple's support will help us improve the kernel to support more platforms. There are various flavors of darwin and nextstep as well:
We have openstep/gnustep which implement most of the cocoa api, putting a proper xnu kernel behind it will give us something very close to an all-opensource mac os.
But all this requires that more people help with the development
So.. a development process like that must have had its highs & lows.. can you remember any particular roadblocks that were frustrating, any particular breakthroughs that had you popping champagne corks?
Perhaps the biggest roadblock for most people including me was to understand how the build process is setup, and how the kernel works. the kernel is very different from linux for example, and also doesnt have the exposure that linux does. amit singh's excellent book about the kernel was a great help. Another problem we had was not having an active maintainer to really drive the process.
Almost everyone has switched to vanilla kernel, prushing people like me as an sse2 user (and one with quite a bit of time!) to think about legacy support as well, and because of all the discussions it stirred up, turbo came forward with his excellent rewrite of the sse3 emulator
That's the 3rd? 4th? sse3 emu?
It's the 3rd 'generation'; the original one was written by the great maxxus, as everyone knows, but it was quite slow and incomplete. Semthex/rufus/oui improved it a lot, making a complete emulator but it had its problems - its design forced us to have a writable commpage area (which is otherwise read-only), and it also limited us to only 2 (or 4) threads of sse3 emulation at a time.
Some time back, turbo started writing a user-space emulator for the AppleTV. The old emulator resided in the kernel (although emulation was mostly userspace). In the appletv, you cannot swap the kernel without losing functionality, prompting turbo to create a userspace emulator, but it was too slow to port over to regular computers. So with inputs from turbo i started writing a fully reentrant emulator for PCs, but without much assembly experience i couldnt get very far. Turbo finally ported his emulator to xnu, and i helped with the testing and gave inputs until it was able to boot the entire gui. the difference was that it could run any number of threads.
The old emulator could only run up to a fixed number of threads, with each "slot" taking a lot of space in the commpage (which has a limited number of memory pages)
The new emulator was tested with 90+ threads and it worked without a hitch. This also fixed long standing issues with, for example, iCal, or fs_usage, or Logic Pro. Turbo's original version took advantage of gcc's code generation and made simplifying assumptions of which instructions to emulate, but it turned out in leopard apple has used a lot of exotic sse3 operands, so to counter this i wrote an operand decoder which would allow 100% of the sse3 opcode/operand combinations to be emulated, and thus the 3rd generation was born.
I remember when we did the first benchmark against the old emulator we were expecting it to be about 20% slower.
To our surprise it was 2 times faster. We figured it was because of the commpage write that the old emulator used. i also pushed turbo into optimizing the emulator further and gave suggestions... Now the final version is 3x faster
We had the ingenious idea to patch one of the isntructions (lddqu) directly, making it some 400 times faster. This worked fine but some hard to avoid technicalities with the memory manager forced us to abandon it.
This new emulator is still on an average 3x faster, and finally allows everyone to run any sse3 apps without failure.
And this is not the biggest celebration we've had ..
As I talk with you, people on the "voodoo" team are testing something big
With great work done by "kaitek", i'm happy to announce that we have an on-the-fly opcode patcher
This would mean AMD users could run any application which specifically tests for intel cpus - without having to rely on decrypts or patched versions
That's nothing short of revolutionary... and it works?
It's another step in our vision of unifying the various "types" of the xnu kernel into one universal kernel for all. AMD users will not longer have to wait anymore for special versions of apps / kernels. Going by initial testing i have just done some hours ago, it works 100% Although there are some issues with performance under a lot of stress on the memory manager. We are stress-testing it to make sure it offers no regression.
I guess this opens for retail installs on AMD as well?
Yes AMD users can now install retail as long as they use this kernel. While this may still not be technically fully vanilla - which usually refers to using 'vanilla' kernel in addition to all other parts of the OS, it is still a great step for all amd users where they now will enjoy the same benefits as intel users do. Also, an added bonus for SSE2 intel users is that we can patch any opcode, so the lddqu issue I talked about earlier with the emulator, vanishes. we get 400x faster lddqu emulation compared to the previous 3rd generation emu's lddqu routine. As lddqu is also the 2nd most used opcode, this will improve performance of applications a lot. We still need to run benchmarks on whether the performance hit of on-the-fly patching for sse2 is higher or lower than the performance gain we'll see by patching it, but for amd cpuid emulation, this is not an issue, as cpuid is not performance critical and very sparsely used.
Kaitek built an entire disassembler for this, which is blazingly fast. From our initial reports, it was able to dissassemble and patch microsoft word (a 40 mb image) in 200 milliseconds. There have been other changes by me to the timestamp counter and cache detection routines which adds support for a lot of legacy CPUs. this part of the code was somewhat 'hackish' in the ToH source. we are now perfecting these changes. Once we can verify that there are no regressions in functionality/platform support compared to current toh kernels, the 'voodoo' kernel will be available for all
It seems to me that on-the-fly patching is a bit of a paradigm shift.. whether used for cpuid or sse3 emulation
it is similar in spirit to rosetta: While rosetta has to translate every PPC instruction into intel, we only translate what is required. It's also true that rosetta has a performance hit, and so does our opcode patcher, but being in-kernel, and heavily optimized, it will have (imo) no *noticeable* performance hit (i'm running this kernel as we talk). Also because rosetta keeps track of translated instruction blocks and has to translate complex jump/branches. In our case, it is simple a patch of the cpuid instruction to the 'int 0xfb' instruction so it has no memory footprint either. The current issue we have is that we can only patch in what fits within the length of the opcode to patch so not all sse3 (or other) instructions can be patched, which means the sse3 emulator will stay within the kernel for some time, however given its primary purpose as a cpuid emulator, it serves the amd community very well. We have word from Leo4All that the next iteration of the dvd will include voodoo kernel. We are hoping that all distros will switch to this unified kernel so that we avoid fragmenting efforts within the community. In essence, we can forget about 'intel/amd/sse2/sse3/nohpet/sleep/speedstep/nonx' tags completely!
You drop this kernel on your system - it works (well except for certain old pentium4 users, but we are also collaborating with the Chameleon bootloader developers to fix this issue as well)
What's the issue?
The issue is with the new clock initialization routine. David Elliot (otherwise known as 'dfe') has a nice fix to disable hpet on unsupported systems, so we decided to remove the hpet emulation entirely since its only consumer - the apple supplied speedstep driver - is replaced by our speedstep kext and we replaced the clock initialization routine to read the bus ratio directly from the processor. This is implemented by all AMDs and most intel processors from pentium M and pentium 4 model 3 and above but the old willamette and northwood core p4's don't have this register. We had the choice to put a clock timing routine in the kernel to calculate the cpu frequency / bus ratio without which xnu will panic, but we learnt that the chameleon bootloader already does this so instead of reinventing the wheel we simply asked the chameleon devs to also export the cpu frequency in addition to the fsb frequency in the fake EFI tree. For users who cannot or do not want to upgrade their bootloader, there is a boot-time flag they can set to explicitly pass their bus ratio to the kernel.The patch for chameleon is already in their repository, and the build will be released at the same time as the kernel so P4 users dont have to wait (collaboration ) it seems this is a recurrent theme.
We are hoping other projects, perhaps openstep/gnustep, could benefit: The linux source code is very complicated and large, but xnu being a hybrid micro/macro kernel is just a 7.5 mb download, and compiles in minutes, which means it's a good study tool for anyone interested in kernel development. (specially when Amit singh's excellent book Mac OS X Internals is used with it!) The origins of the xnu kernel can also be traced to an academic institution (Carnegie Mellon to be specific) which wrote the mach subsystem of the kernel. Mach is however a microkernel, while xnu is not. This is a design decision made by apple, to combine bsd and mach into one. (studying how it's integrated is also interesting for a lot of engineering students)
I've got a vague idea what a microkernel is, and what a monolithic kernel is.. is there a short explanation of where xnu falls & what the difference is?
a microkernel like mach only contains the most basic code to initialize the processor and memory, and be able to create threads/processes and pass messages between them. Almost all other functionality, including often the hardware drivers, are in userland (ie. less privileged), but this means that they all have to communicate with the microkernel very often and this is slow. A monolithic kernel like linux takes a different approach: All drivers, and a lot of the other components, like the networking stack, are put directly into the kernel as one huge binary (to put it simply) which loads once, and shares memory so the performance is better, but this also means, a failed/crashed driver (or another subsystem) will crash the entire system, or compromise the security, so it's a tradeoff between performance and stability/security.
Apple chose an almost-monolithic path in that the mach microkernel, the BSD subsystem (which provides the "unix" services) and the driver interface are all one binary which share memory, thus being fast, but they are built as separate components and linked together. Most of the other componenets like drivers are loaded later (but also end up in the kernel) so it's still a bit more modular than linux, but not completely a microkernel. Apple decided that a failed userspace with the kernel running is still a crashed system from a user's point of view so they gravitated towards more performance (we all know apple's systems are stable as well, owing to its tried and tested mach/bsd heritage !)
Apple have also engineered their driver interface (IOKit) brilliantly allowing us to load the same driver binary across different kernel versions and has a nice 'matching' scheme allowing things like the NullCPUPowerManagement.kext which disables apple's speedstep/PM driver or natit, graphic settings injector etc. All of which makes it possible run an unmodified (legally obtained!) os x on commodity hardware.
Wow.. Thank you for giving us this insight, it's been an education. Is there anything else you'd like to add before we finish?
hmm I'd just like to add that a lot of people have helped us along, tireless testers like bhast2 (leo4all creator, amd testing), sckevyn (who stress tested the sse3 emulator in a professional fashion), motivator/manager Galaxy (who actually got us all together - we wouldn't be doing this if not for him) and the chameleon team, people in IRC and everyone else, so we just hope that we get even more people to help us out!
Latest information: http://groups.google...nu-dev/web/home