Jump to content
102 posts in this topic

Recommended Posts

Some news: When you get this error on your system.log...

Jul 18 15:30:05 localhost kernel[0]: disk1s3: I/O error.
Jul 18 15:30:05 localhost kernel[0]: jnl: do_jnl_io: strategy err 0x5
Jul 18 15:30:05 localhost kernel[0]: jnl: write_journal_header: error writing the journal header!

As I said before, this makes that partition read-only, but since the mount command seems not to know it, your system will crash in a few minutes.

(I've noticed that this crash is NOT a kernel crash. The forcedeth extension continues to work, ie. returns standard ICMP packets to me when I use another computer to bother Mac's IP).

 

Well, after seeing that error and noticing that I couldn't write more files to root partition, I issued this:

sudo -s
mount -u -rf /dev/disk1s3 /
mount -u -w /dev/disk1s3 /

(replace disk1s3 with the partition that failed, usually root partition)

This causes OS X to remount your root partition, giving you write access again to your files, and hopefully giving you also some more minutes before the crash.

 

I've been using OS X for 45 minutes this way (2 remounts so far), which is way more than I used to get before crash. I hope I could stop the fatal crash forever! If so, I'll write a script that greps the logs and remounts the partition when this happens.

 

OS X is really beatiful now that I can even launch iTunes without it crashing!! :)

 

However, don't count on it: this may still be placebo effect, and OS X may crash anytime soon.... :happymac:EDIT: It crashed after 65 minutes. Another useless discovery by me...

 

Notice that I didn't test what happens with corruption, as I still haven't rebooted OS X (no need to yet :blink: ). EDIT: Corruption was so large that it rendered root partition unbootable & unfsckable.It was a fresh setup. So... better forget about this....

A tip jape, give up. Writing a driver is the only way to fix this, and this is way beyond my capabilities. If it wernt I would have done it already.

 

There is no n00bish programming documentation for programming a driver in mac, it all asumes you know what the hell its talking about. So, if you want to ever get it working, become a hero and write a driver for it. Otherwise, give up and just buy a pata hard drive.

Writing a driver is the only way to fix this

 

I think that's wrong (AFAIK).

 

nv's SATA implementation is just PATA with a few addons. In fact, it's so similar to PATA, that some OS do believe it's a PATA controller, namely:

  • MS-DOS 7 (sorry not to provide Debug information, I don't know how to do it easily. However, I do see my FAT32 (this is win98 boot disk) partition appearing as letter C:... a partition which is primary on my SATA Disk )
  • Linux 2.4. Have a look at this output of cat /proc/ide :
     Bus  0, device   6, function  0:
    IDE interface: PCI device 10de:0053 (nVidia Corporation) (rev 242).
      Master Capable.  No bursts.  Min Gnt=3.Max Lat=1.
      I/O at 0xf000 [0xf00f].
     Bus  0, device   7, function  0:
    IDE interface: PCI device 10de:0054 (nVidia Corporation) (rev 243).
      IRQ 20.
      Master Capable.  No bursts.  Min Gnt=3.Max Lat=1.
      I/O at 0x9f0 [0x9f7].
      I/O at 0xbf0 [0xbf3].
      I/O at 0x970 [0x977].
      I/O at 0xb70 [0xb73].
      I/O at 0xd800 [0xd80f].
      Non-prefetchable 32 bit memory at 0xd8002000 [0xd8002fff].
     Bus  0, device   8, function  0:
    IDE interface: PCI device 10de:0055 (nVidia Corporation) (rev 243).
      IRQ 23.
      Master Capable.  No bursts.  Min Gnt=3.Max Lat=1.
      I/O at 0x9e0 [0x9e7].
      I/O at 0xbe0 [0xbe3].
      I/O at 0x960 [0x967].
      I/O at 0xb60 [0xb63].
      I/O at 0xc400 [0xc40f].
      Non-prefetchable 32 bit memory at 0xd8001000 [0xd8001fff].


    Visible diferences? Some. But to the system SATA just appears as a "more advanced" plain old IDE controller.

I used Knoppix 2.7 (based on linux24) to try to mount it. Grep'd the logs, and this surprised me:

NFORCE-CK804: 00:06.0 (rev f2) UDMA133 controller
ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
NFORCE-CK804-SATA: IDE controller at PCI slot 00:07.0
NFORCE-CK804-SATA: chipset revision 243
NFORCE-CK804-SATA: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE-CK804-SATA: 00:07.0 (rev f3) UDMA133 controller
NFORCE-CK804-SATA: 100% native mode on irq 20
ide2: BM-DMA at 0xd800-0xd807, BIOS settings: hde:DMA, hdf:DMA
ide3: BM-DMA at 0xd808-0xd80f, BIOS settings: hdg:DMA, hdh:pio
NFORCE-CK804-SATA2: IDE controller at PCI slot 00:08.0
NFORCE-CK804-SATA2: chipset revision 243
NFORCE-CK804-SATA2: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE-CK804-SATA2: 00:08.0 (rev f3) UDMA133 controller
NFORCE-CK804-SATA2: 100% native mode on irq 23
ide4: BM-DMA at 0xc400-0xc407, BIOS settings: hdi:DMA, hdj:DMA
ide5: BM-DMA at 0xc408-0xc40f, BIOS settings: hdk:DMA, hdl:pio
hdi: Maxtor 6V200E0, ATA DISK drive
ide4 at 0x9e0-0x9e7,0xbe2 on irq 23

Yes! The amd74xx module accepts both the PATA controller (amd74xx is the standard module linux24 used for nForce PATA) and ... the two SATA controllers! Maxtor 6V200E0 IS MY SATA DRIVE.

 

I googled for the string "NFORCE-CK804-SATA2"; this is the result:

http://lkml.org/lkml/2004/5/27/155

From: Bartlomiej Zolnierkiewicz <B.Zolnierkiewicz@elka.pw.edu.pl>

 

From: "Brian Lazara" <blazara@nvidia.com>

 

Add device IDs for new nForce IDE and SATA controllers. Rename some of the

existing controller names to correctly match released product names.

Excerpt from the attachment: (a patch)

+++ 25-akpm/drivers/ide/pci/amd74xx.c	Tue Jun  1 17:06:26 2004
+#define PCI_DEVICE_ID_NVIDIA_NFORCE_CK804_IDE	0x0053
+#define PCI_DEVICE_ID_NVIDIA_NFORCE_CK804_SATA	0x0054
+#define PCI_DEVICE_ID_NVIDIA_NFORCE_CK804_SATA2	0x0055

So... nvidia in their infinite wisdom, enabled nForce SATA support in Linux 2.4 just by adding the nForce SATA Controller pci ids to the already existing & working amd74xx PATA driver!!!

We (hackintoshers) were doing already the same, but we were looking at the wrong driver (VIA).

 

 

What happened to these ID's on the current, 2.6 kernel? Why it doesn't use amd74xx too for my SATA drives? I googled again:

http://lkml.org/lkml/2004/5/27/190

> 3) Normally we want to add SATA support to libata not

> drivers/ide. Do

> the nVidia SATA chips support SATA SCRs or anything like

> that? Why not

> use libata?

We do plan on moving our SATA support to libata.

They moved to libata indeed, and as of Thu, 17 Jun 2004 00:33:21 -0400 those device IDs were removed from the 2.6 branch's amd74xx module: sata_nv was then created.

 

To sum it up: we need to trick the driver which handles nForce PATA to handle the SATA controllers too -- it might be as simple as the AppleVIAATA trick... I haven't reinstalled OS X yet, so I don't really know which driver is it.

 

WARNING, if someone wants to test that: This might even damage your partition table, so backup first.

 

I'd like to hear your thoughts about this....

I failed once again...

 

I tried putting the nForce 4 SATA Controller device id in...

IOATAFamily.kext/Contents/Plugins/AppleOnBoardPCATA.kext/Contents/Info.plist : nForceATAPCI

 

But after a reboot It won't find the root device (i.e. that kext said "no match"/"no drives here"). No errors on verbose log either...

 

 

So, if a hero shows up and wants to fix it.. he'll need to fix AppleOnBoardPCATA too...

 

EDIT:NOT everything is lost, maybe: [HowTo] Speed up hard disk access

static const HardwareInfo hardwareTable[] =
{
	{ 0x01bc10de, 5, "NVIDIA nForce"  },
	{ 0x006510de, 6, "NVIDIA nForce2" },
	{ 0x00d510de, 6, "NVIDIA nForce3" },
};

It seems that the OnBoardPCATA driver has got a hardcoded dev id table.

 

May I ask someone to build a AppleOnBoardPCATA to me which has "0x005310de" As "NVIDIA nForce 4" and "0x005510de", "0x005410de" there as "NVIDIA nForce 4 SATA" both ? (maxUltraMode 6 on all of them), ie:

 

static const HardwareInfo hardwareTable[] =
{
	{ 0x01bc10de, 5, "NVIDIA nForce"  },
	{ 0x006510de, 6, "NVIDIA nForce2" },
	{ 0x00d510de, 6, "NVIDIA nForce3" },
	{ 0x005310de, 6, "NVIDIA nForce4" },
	{ 0x005410de, 6, "NVIDIA nForce4 SATA 1" },
	{ 0x005510de, 6, "NVIDIA nForce4 SATA 2" },
};

 

And I'd like some opinion too... :( I'm crazy? Maybe...

Your very enthusiastic just like I was, but your smarter than me and know whats going on. You know what your looking for, which I never did so I was never really much help, I just did many trial and error tests, pretty much everything was error though :rolleyes: lol. Ask planetbeing, he offered his services at one stage and now that you have researched into it more he may be able to help.

 

If I can add assistance dont hesitate to ask. I may be able to move all of my things to my other computer and then I can try this worry free. I have 6weeks free starting tomorrow, so I will probrably get bored and try this again. I haven't missed much in the time I have been inactive.

 

TheFighter

 

Edit: Before I forget, what versions of the nForce driver for windows is everybody running? When I updated it messed up my boot times, slowed it right down. And that really annoying hopswap feature to. Do you used the one for AMD only or AMD/Intel?

Thanks a lot :) . I don't really know a lot about ATA either... just fished the Linux Kernel Mailing Lists ;)

 

BTW, I requested a AppleOnBoardPCATA.kext built with nForce 4 SATA dev ids ( details here ), since I don't have XCode.

My nForce 4 PATA controller is recognized using that kernel extension, so I assume that my OSX build was already somehow patched, and that my kext binary already has the nForce PATA id built in.

 

It would be interesting to see which kext OS X loads for nForce PATA when the AppleOnBoardPCATA.kext is not present or does not have the nForce 4 PATA id.

 

Edit: Before I forget, what versions of the nForce driver for windows is everybody running? When I updated it messed up my boot times, slowed it right down. And that really annoying hopswap feature to. Do you used the one for AMD only or AMD/Intel?

I have ASUS A8N-E, and I use the 6.85 x16 Intel-AMD ones although I were supposed to use 6.70 AMD ones -- blame ActiveArmor, it doesn't work with 6.70 for me ;) .

 

It seems that now there's a unified nForce driver again. Both "nForce 4 AMD series" and "nForce 4 Intel x16" redirect to Version 6.86

After lots of tries... I finally booted from SATA using the AppleOnBoardPCATA.kext .

 

Jul 24 02:41:35 localhost kernel[0]: Extension "com.apple.driver.AppleOnboardPCATA" has no kernel dependency.

Jul 24 02:41:35 localhost kernel[0]: Primary going NATIVESec going NATIVEStart ATA Channel client matchingAppleNVIDIAnForceATA: start( 0x2f46a00, 0x2eada00 )

Jul 24 02:41:35 localhost kernel[0]: [CH0] getBMBaseAddress

Jul 24 02:41:35 localhost kernel[0]: BMBaseAddr = c400

Jul 24 02:41:35 localhost kernel[0]: PCI_IDE_ENABLE 0x13

Jul 24 02:41:35 localhost kernel[0]: PCI_IDE_CONFIG 0x00

Jul 24 02:41:35 localhost kernel[0]: PCI_CABLE_DETECT 0x00

Jul 24 02:41:35 localhost kernel[0]: PCI_FIFO_CONFIG 0x15

Jul 24 02:41:35 localhost kernel[0]: [CH0] resetBusTimings

Jul 24 02:41:35 localhost kernel[0]: --USING IRQ 23[CH0] configureTFPointers

Jul 24 02:41:35 localhost kernel[0]: [CH0] scanForDrives

Jul 24 02:41:35 localhost kernel[0]: AppleNVIDIAnForceATA: start( 0x2f46c00, 0x2f00100 )

Jul 24 02:41:35 localhost kernel[0]: [CH1] getBMBaseAddress

Jul 24 02:41:35 localhost kernel[0]: BMBaseAddr = c408

Jul 24 02:41:35 localhost kernel[0]: PCI_IDE_ENABLE 0x13

Jul 24 02:41:35 localhost kernel[0]: PCI_IDE_CONFIG 0xf0

Jul 24 02:41:35 localhost kernel[0]: PCI_CABLE_DETECT 0x00

Jul 24 02:41:35 localhost kernel[0]: PCI_FIFO_CONFIG 0x15

Jul 24 02:41:35 localhost kernel[0]: [CH1] resetBusTimings

Jul 24 02:41:35 localhost kernel[0]: --USING IRQ 23[CH1] configureTFPointers

Jul 24 02:41:35 localhost kernel[0]: [CH1] scanForDrives

Jul 24 02:41:35 localhost kernel[0]: Super started----uh, oh--AppleNVIDIAnForceATA::free( 0x2f46c00 )

Jul 24 02:41:35 localhost kernel[0]: [CH0] provideBusInfo( 0x2f2e2e0 )

Jul 24 02:41:35 localhost kernel[0]: AppleNVIDIAnForceATA: NVIDIA nForce4 SATA2 (CMD 0x9e0, CTR 0xbe0, IRQ 23, BM 0xc400)

Jul 24 02:41:35 localhost kernel[0]: [CH0] provideBusInfo( 0x2fccb20 )

Jul 24 02:41:35 localhost kernel[0]: [CH0 D0] getConfig( 0x2fcc980 )

Jul 24 02:41:35 localhost kernel[0]: PIO mode 0 @ 600 ns

Jul 24 02:41:35 localhost kernel[0]: [CH0 D0] selectConfig( 0x2fcc980 )

Jul 24 02:41:35 localhost kernel[0]: [CH0 D0] selectTimingParameter( 0x2fcc980 )

Jul 24 02:41:35 localhost kernel[0]: selected PIO timing entry 4

Jul 24 02:41:35 localhost kernel[0]: selected Ultra mode 6

Jul 24 02:41:35 localhost kernel[0]: [CH0 D0] getConfig( 0x2fcc980 )

Jul 24 02:41:35 localhost kernel[0]: PIO mode 4 @ 120 ns

Jul 24 02:41:35 localhost kernel[0]: Ultra mode 6

Jul 24 02:41:35 localhost kernel[0]: Got boot device = IOService:/AppleACPIPlatformExpert/PCI0@0/AppleACPIPCI/SAT2@8/AppleOnboardPCATARoot/PRI0@0/AppleNVIDIAnForceATA/ATADeviceNub@0/IOATABlockStorageDriver/IOATABlockStorageDevice/IOBlockStorageDriver/Maxtor 6V200E0 Media/IOFDiskPartitionScheme/Untitled 3@3

Jul 24 02:41:35 localhost kernel[0]: BSD root: disk0s3, major 14, minor 3

 

However, it fails the same way it failed with the VIA ATA extension: crashes, freezes, corruption, and "IOATA device blocking bus". Too bad :)

 

So I failed again...

 

I noticed that VIA ATA source code and nForce ATA sources are VERY similar. In fact, the only real diff from VIA SATA to nForce ATA is that the last one sets timings for each device OK (including SATA here). VIA does not set SATA timings.

 

I'm starting to feel that we're looking at the wrong layer, and that this may be a way more difficult problem, hidden within IOATAController itself. Even If I were decided to write a extension from scratch, I find it so similar to existing Linux's amd74xx that I am not sure if it would work.

 

Any suggestions?

Here is some info about what I did to the AppleOnboardPCATA.kext. As I said, it is non-working (works like the VIA SATA trick...), but it may be useful for someone.

 

---FILE: AppleNVIDIAnForceATA.cpp

---FIND:

		{ 0x01bc10de, 5, "NVIDIA nForce"  },
	{ 0x006510de, 6, "NVIDIA nForce2" },
	{ 0x00d510de, 6, "NVIDIA nForce3" },

---ADD CODE, AFTER:

{ 0x005510de, 6, "NVIDIA nForce4 SATA2" },

-----This is the secondary SATA controller device id, I have my hard drive there. Replace it with 0x0054 if necessary.

---FIND:

		if (ultraModeNumber > 2)
	{
		if ( f80PinCablePresent == false )
		{
			ERROR_LOG("%s: 80-conductor cable not detected on channel %u\n",
				getName(), fChannelNumber);
			ultraModeNumber = 2;
		}
	}

---REPLACE THAT WITH:

		if (ultraModeNumber > 2 && false) //SATA Ignore this
	{
		if ( f80PinCablePresent == false )
		{
			ERROR_LOG("%s: 80-conductor cable not detected on channel %u\n",
				getName(), fChannelNumber);
			ultraModeNumber = 2;
		}
	}

------80-conductor cable is NOT being detected on SATA, so it tried to use ultraMode 2, which failed. I had a look at the VIA driver, which ignored the 80-cable warning, so I ignored it here too.

--FILE: AppleOnboardPCATARoot.cpp

--FIND:

OSDictionary * CLASS::createNativeModeChannelInfo( UInt32 channelID )
{
UInt8  pi = fProvider->configRead8( kIOPCIConfigClassCode );
UInt16 cmdPort = 0;
UInt16 ctrPort = 0;

--AFTER, ADD:

	//Force native mode for SATA
pi = 0xFF;

------This is also inspired by the VIA ATA driver. If this is not added, driver would always use Legacy mode (IRQs 14-15), although the Bus Master base address would be OK (and it would fail to enum drives).

--SAVE ALL & BUILD

----Built extension will be saved to build/Development/AppleOnboardPCATA.kext

--INSTALL the extension moving it to IOATAController.kext/Contents/Plugins, fixing permissions (root:wheel & 755)

--FILE: IOATAController.kext/Contents/Info.plist

--FIND: ApplenForcePCI

--ADD YOUR DEVICE IDs

--SAVE

--DELETE /System/Library/Extensions.mkext /System/Library/Extensions.kextcache

 

BTW, the driver will unload itself if it does not find a drive. This seems to be standard behaviour with AppleOnboardPCATA.

 

Data corruption is still present as with the VIA ATA driver, you have been warned.

It would be interesting to see which kext OS X loads for nForce PATA when the AppleOnBoardPCATA.kext is not present or does not have the nForce 4 PATA id.

As much, as i remember, AppleGenericPCATA is loaded "by default". But, it is way too slooow(and eats CPU like hungry bear). And, what if to try to lower DMA mode for SATA controllers to 2(I tried this with PATA, XBench scores lowered just a little)?

As much, as i remember, AppleGenericPCATA is loaded "by default". But, it is way too slooow(and eats CPU like hungry bear). And, what if to try to lower DMA mode for SATA controllers to 2(I tried this with PATA, XBench scores lowered just a little)?

Thanks for comments.

 

I tried putting "maxUltraMode" setting to 0, and the driver loaded using "PIO mode 4/Ultra mode 0", which is a bit wierd to say the least. It was slower as you said, but a "ATA blocking bus" message apperead. I do have a feeling that it corrupts less in this mode, but it is not acceptable either.

 

BTW; I need Native mode support for SATA to work, but I found this in AppleGenericPCATA...

// FIXME: add native mode support

So AppleGenericPCATA is out of the question for SATA then.

  • 2 weeks later...

During one of my tests (tryying GenericPCATA)... my Maxtor died. It was not detected by the BIOS anymore.

 

I took it to a friend's computer, we connected it to his VIA SATA controller and his BIOS detected it, but the mbr & partition table were damaged.

 

I wiped the drive using a Maxtor floppy, connected again to my computer and it worked, but since I had to restore from backup, the latest kext experiments were lost. I still have the ApplenForcePCATA-2 patches (the ones which fail like the VIA SATA).

 

Just a warning to kext doctors ;) I am a little scared right now. If I were to accept to lose a hard drive, I would buy a PATA one right now.

Isnt the major difference between nForce 3 and nForce 4 the SATAII specification?

 

I think SATAII is supposed to have NCQ Support, 3.0Gb/s transfer, Hot Swap, and possibly other features.

 

Is there any chance that BIOS changes could have made differences? What if we took a board with an original early BIOS? I know its a long shot but updating my BIOS made my ethernet jack cut out, so I suppose its a possibility.

I have what i think is an interesting experience to bring to this thread. I am a PC user, and by searching a problem for my computer, i found this thread, which proved very interesting. But let's not jump to conclusions.

 

About a year ago, i bought a new motherboard+CPU+RAM+Graphic card for my computer, AMD XP 64 3000+, A8N-SLI (nForce 4), 1 GB RAM and a GeForce. I usually run on Windows 2000, occasionnaly on Linux (Mandriva). I had two hard drives, both IDE PATA. Everything was working out well.

About 8 months ago, the smallest and oldest of my hard drives died. I bought a new one, a SATA 250 GB. That's when things got ugly. At first, it would freeze/crash very often when i accessed the SATA drive, especially on large file transfers. On some reboots, the SATA harddrive wouldn't be detected. I had to shutdown the computer completly and restart it a few minutes later for the BIOS to detect it again.

To solve my problem, i flashed my BIOS. The freezes became MUCH rarer, but still continued to happen, mostly on large file transfers on the SATA drive.

Recently, i resinstalled my Linux, and I discovered that the crashes would happen even on Linus Thorvald's system, even more often than on Windows, and always when accessing the SATA harddrive.

I've searched on the internet, and found quiet a few matching problems, including this thread. Even though my problem is on PC and this forum is about Mac, i think the problems are identical.

 

My conclusion is this one : The nForce4 chip has a very low level bug on SATA drives read/write routines. I have no idea what a definitive solution could be, except maybe changing the motherboard.

I don't believe that is "our problem", cause I have NO freezes at all using Ubuntu 6.06 & Windows XP -- and I usually run several VMs at a time, causing lots of disk activity.

 

However, as I said, OS X hangs every few minutes in this computer.

 

So I think we can surely assume this is not the result of a nForce bug. Even if this bug existed, a patch does NOT exist in the linux sources.... and the kernel works flawlessly.

 

Try to boot MS-DOS 7 (FAT32 support), and access a SATA disk. You'll see how it works better than using OS X.

I find it really weird, as DOS has NOT native IDE support, which means it uses legacy mode, which fails completely (it does not detect any drive) on OS X (as I discovered while patching the nForceATA kext).

 

We really need an expert here: a "The nForce SATA hero" will need to compare the entire IOPCIATA class tree to the linux driver libata module, and find where the differences are.

 

I am not even able to compile IOATAController in debug/x86, as I'd like to see DLOG'd messages, like planetbeing did.. I must be missing something, cause my build crashes on boot.

 

 

P.S. Maybe someone wants to do too a "nForce SATA stress test" on DOS ? I'd like to verify results. Just create a spare FAT32 partition, boot a Win9x boot disk (MS-DOS 7) and try copying some files around, then (under another OS, or using DOS crc checkers) try to detect if files are corrupt.

Now that I'm enjoying stable Mac OS X on a PATA disk :(:poster_oops:;), I'm still researching.

I'm now using the VIATAdriver again as I have a stable nForceATA kext loading my XCode environment :).

 

I had an idea. Since I can't build IOATAController (I explained my builds crash), I overrided the function I'm interested in ("selectDevice") into the VIAATA driver, and enabled debug there. And I think I found something VERY interesting:

 

After loading the ext, the NTFS driver started complaining about corrupt data (as always).

But preceding the first IOATAController device blocing bus...

 

Aug 22 04:32:41 localhost kernel[0]: IOATA: BUSY can't select device. 
Aug 22 04:32:41 localhost kernel[0]: IOATAController device blocking bus.

 

Now THAT's what I call useful debug.

 

I said to myself... ok, now let's try to crash the kernel like the good old days: I opened a QuickTime movie on the SATA disk while copying three files. Lots of blocking bus errors. And then, it freezed, but this time it was DIFFERENT: QuickTime and the Finder were not responding, the rest of the system WAS OK. Every application that tried to read the SATA disks crashed (Safari too). "Like the old days", as I said -- however, this time, Console was running.

 

So I had a look at the logs...

Aug 22 04:41:04 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 04:41:04 localhost kernel[0]: IOATAController device blocking bus.
Aug 22 04:41:15 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 04:41:15 localhost kernel[0]: IOATAController device blocking bus.

 

The message... has changed!!!! Before "freezing", only BUSY apperead on the logs. After "freezing", BUSY & DRQ appear on the logs.

It appears usually every 11 seconds, sometimes there is a quiet pause, then the errors come up again -- same interval.

 

 

Aug 22 04:42:32 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 04:42:32 localhost kernel[0]: IOATAController device blocking bus.
Aug 22 04:42:43 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 04:42:43 localhost kernel[0]: IOATAController device blocking bus.
Aug 22 04:42:54 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 04:42:54 localhost kernel[0]: IOATAController device blocking bus.
Aug 22 04:43:05 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 04:43:05 localhost kernel[0]: IOATAController device blocking bus.

 

I unmounted all of the drives. The error continued to appear at intervals, so it wasn't an application doing that -- it was the kernel.

 

Hope this info helps.

 

---------------------------------------------

 

 

More debugging

Every few miliseconds a selectDevice happens and it succeeds (kATANoErr). Sometimes it throws a IOATA: BUSY can't select device, there's a timeout and it happens randomly.

 

In order to to throw BUSY or DRQ can't select device. it needs to "really" select a new device, but it shouldn't happen that often as I have just one device. Another timeout needs to happen to show that error.

 

I'll try setting a longer timeout to see what happens, then I'll post the results here.

Edited by jape

GOOD NEWS!!!

 

I found that Linux has 1000 times timeout loop while checking the BUSY flag on the ATA controller, but OS X had only 10 times loop.... and this was causing the IOATAController device blocking bus.

 

Putting an extra 1000 timeout loop has COMPLETELY DELETED THE BLOCKING BUS MESSAGE. And ALL my partitions have been detected CORRECTLY. (It usually missed a few random partitions while giving NTFS errors, this time, NO NTFS error).

 

This time I think I'm a step near to perfection!!!!!!!

 

Now I'm testing for corruption & freezing.

 

----------------------------------------------

 

BAD NEWS

 

During stress test (QuickTime & mounting 1 GiB DMG image), QuickTime & the DMG mounter freezed, as always. On the logs, the first IOATAControllre device blocking bus for the entire session apperead.

Aug 22 18:14:26 localhost kernel[0]: IOATA: BUSY can't select device. 
Aug 22 18:14:26 localhost kernel[0]: IOATAController device blocking bus.
Aug 22 18:14:58 localhost kernel[0]: IOATA: selectDevice: unit != selectedUnit. 
Aug 22 18:15:09 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 18:15:09 localhost kernel[0]: IOATAController device blocking bus.
Aug 22 18:15:09 localhost kernel[0]: IOATA: selectDevice: unit != selectedUnit. 
Aug 22 18:15:13 localhost kernel[0]: (75: coreservicesd)tfp: failed on 0:
Aug 22 18:15:20 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 22 18:15:20 localhost kernel[0]: IOATAController device blocking bus.

 

It seems that a BUSY timeout is going to happen ALWAYS under stress conditions, so now i'm trying to check Linux error recovery code for this.

Edited by jape

Thanks a lot :) .

 

I'm doing some more newbie debugging on selectDevice....

 

(Test I'm using: Mounting 1 GiB XCode DMG image from NTFS partition on SATA disk)

 

While the disk works, selectDevice works perfectly. It always selects disk 0, so it doesn't anything.

Aug 23 01:29:49 localhost kernel[0]: IOATD: Trying to select device: i 
Aug 23 01:29:49 localhost kernel[0]: IOATD: Succesful device selection i 
Aug 23 01:29:49 localhost kernel[0]: IOATD: Trying to select device: i 
Aug 23 01:29:49 localhost kernel[0]: IOATD: Succesful device selection i

As you can see, no blocking bus.

 

Suddenly, the DMG mount dialog freezes. Less than a second after, the Console shows:

 

Aug 23 01:29:49 localhost kernel[0]: IOATD: Trying to select device: i
Aug 23 01:30:00 localhost kernel[0]: IOATA2: busy status: 0 
Aug 23 01:30:00 localhost kernel[0]: IOATA: BUSY can't select device. 
Aug 23 01:30:00 localhost kernel[0]: IOATAController device blocking bus.

read: Some OTHER operation somehow manages to trick the SATA controller into being BUSY forever. Even a 10 seconds timeout is not enough -- it is BUSY FOREVER, until you turn the power off

(yeah that's true, GRUB isn't able to warm boot Windows after this, I have to power off the system. I don't know if this happened before as I had Windows on PATA).

 

Then, another unexpected message comes on the log. The selectedUnit is no longer "considered valid" (-1). I haven't found which instruction invalidates the selectedUnit, but is NOT the one on selectDevice(), as it cannot run until we change disks, which is NOT happening on my system because, as I said, I only have a disk.

 

I don't know why it is invalidating the selectedUnit but I think that If someone finds the reason (and the file :) ) we may be nearer to the BIGGER failure.

 

 

Since the unit is invalidated, and the BUSY flag is ON, it goes on waiting forever.

If the system HFS partition were to be on this device, your system would be frozen forever. Which is what happens, you know :D :D ;).

 

Aug 23 01:30:32 localhost kernel[0]: IOATD: Trying to select device: i 
Aug 23 01:30:32 localhost kernel[0]: IOATA: selectDevice: unit i != selectedUnit i. 
Aug 23 01:30:43 localhost kernel[0]: IOATA2: Busy: 0 Request: 1 
Aug 23 01:30:43 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 23 01:30:43 localhost kernel[0]: IOATAController device blocking bus.
Aug 23 01:30:43 localhost kernel[0]: IOATD: Trying to select device: i 
Aug 23 01:30:43 localhost kernel[0]: IOATA: selectDevice: unit i != selectedUnit i.

(Busy on, request off) The 11 second pause is due to this:

_currentCommand->getTimeoutMS()/10;

 

I'm still waiting for someone who understands how ATA works. Why a BUSY flag can be raised forever. What can be done to reset it. Etc...

Once in my logs I saw NTFS driver errors coming seconds before the very first BUSY signal on selectDevice. This means that the problem is elsewhere... some I/O operation suddenly fails and raises the BUSY flag.

 

BTW, I've seen it resetting the flag, but I don't know what it doees: the DMG mounter hangs, the device is invalidated, then .... the system freezes for a second, and selectDevice succeeds. The DMG mounter, instead of freezing forever, complains about corrupted image & exits.

However, this happens hardly often. As I said, usually selectDevice fails & fails & fails & fails...

 

So well, although I think I've identified the results of the problem, I'm still nowhere near the cause of it: I just know it happens randomly during I/O transfers.

I've found how it is able to recover himself sometimes.

 

Watching a QuickTime video on SATA disk... the video pauses for a few seconds, then skips a few frames (I suppose this is standard QuickTime behaviour with corrupted stream data) and continues.

 

I open Console and this is what I see:

 

Aug 23 16:33:53 localhost kernel[0]: IOATD: Trying to select device: i 
Aug 23 16:34:04 localhost kernel[0]: IOATA2: busy status: 0 
Aug 23 16:34:04 localhost kernel[0]: IOATA: BUSY can't select device. 
Aug 23 16:34:04 localhost kernel[0]: IOATAController device blocking bus.
Aug 23 16:34:05 localhost kernel[0]: IOATA soft reset sequenced
Aug 23 16:34:05 localhost kernel[0]: IOATA reset complete.
Aug 23 16:34:05 localhost kernel[0]: IOATD: Trying to select device: i 
Aug 23 16:34:05 localhost kernel[0]: IOATA: selectDevice: unit i != selectedUnit i. 
Aug 23 16:34:05 localhost kernel[0]: IOATD: Succesful device selection i

 

It soft-resets the bus (whatever this is) and this is why selectedUnit is invalidated.

 

I'm now waiting for it to hang. Sometimes (due to Murphy's Law :( ) it works up to an hour without freezing, usually when I want it to crash :):P;)

 

---------------------------------------------------

 

Ok, I crashed it:

Aug 23 17:06:16 localhost kernel[0]: IOATD: Succesful device selection i 
Aug 23 17:06:16 localhost kernel[0]: IOATD: Trying to select device: i

---------- DMG mounter hangs here ----------------

Aug 23 17:06:28 localhost kernel[0]: IOATA2: busy status: 0 
Aug 23 17:06:28 localhost kernel[0]: IOATA: BUSY can't select device. 
Aug 23 17:06:28 localhost kernel[0]: IOATAController device blocking bus.
Aug 23 17:06:29 localhost kernel[0]: IOATA soft reset sequenced
Aug 23 17:07:00 localhost kernel[0]: IOATA device failed to reset.
Aug 23 17:07:00 localhost kernel[0]: IOATA reset complete.
Aug 23 17:07:00 localhost kernel[0]: IOATD: Trying to select device: i 
Aug 23 17:07:00 localhost kernel[0]: IOATA: selectDevice: unit i != selectedUnit i. 
Aug 23 17:07:11 localhost kernel[0]: IOATA2: Busy: 0 Request: 1 
Aug 23 17:07:11 localhost kernel[0]: IOATA: BUSY or DRQ can't select device. 
Aug 23 17:07:11 localhost kernel[0]: IOATAController device blocking bus.
Aug 23 17:07:11 localhost kernel[0]: IOATD: Trying to select device: i

Again, we are BUSY forever. The soft-reset fails because, after issuing it, the BUSY flag doesn't unset. I'm lost again.

 

Note: I now know about BUSY/DRQ state periodical quiet pauses, they're IOATAController soft-resetting the bus.

 

BTW, a Darwin source comment says this behaviour means:

it is likely that this hardware is broken.

There's no recovery action if the drive fails to reset.

Edited by jape

For those of you who have a Asus A8N......

 

I decided to disable SATA DMA on BIOS. I imagined this would be unusable -- very slow. But, to my surprise, XBench scores with "DMA disabled" are slightly better (just around 2 MB/s) than with "DMA enabled". Really weird.

 

The VIA driver says it's using ultraMode 6 no matter the BIOS SATA DMA option.

 

Windows ignores the setting too.

 

Can you reproduce this? Any comments about if corruption/crashes are less with DMA disabled/enabled?

 

Note: It hangs here even with "DMA disabled", this is mere anecdote.

×
×
  • Create New...