Re: RFC: ATA to CAM integration patch (and gjournaled previuos nodes)

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Sat, 25 Jul 2009 22:19:10 +0300
Juergen Lock wrote:
> On Mon, Jul 06, 2009 at 11:16:46PM +0200, Juergen Lock wrote:
>> I tried this on the box with that optical drive that head no
>> longer likes (fails to be probed and generates an irq storm, see
>> 	http://docs.freebsd.org/cgi/mid.cgi?20090628101656.GA38983
>> ), and with ahci.ko loaded by loader.conf I got timeouts followed by
>> a panic:
>> 	http://people.freebsd.org/~nox/cam-ata.20090704-panic1.jpg
>> 	http://people.freebsd.org/~nox/cam-ata.20090704-panic2.jpg
>> [...]
> 
> Ok I managed to dig myself out of this mess by connecting the problem
> drive to a jmicron pcie card that fell into my hands yesterday; I updated
> the test install to head from today and started reinstalling ports (bc of
> the shlib bumps) and testing the new hplip port on head (seems to work
> no worse than on 7), when suddenly ahci got problems: it printed endless
> retrying messages with the box' disk access led on solid, causing processes
> to get stuck.  I was still able to switch to a console and enter ddb,
> but dumping (call doadump) failed and I didn't know what to look for
> otherwise, so I'm afraid I can't give more info about this hang. :(
> Anyway, could this be caused by ncq?  I have disabled ahci.ko for now,
> we'll see if this `fixes' it...

Difficult to say without seeing those messages. NCQ errors actually may 
lead to series (up to 32) of retries, as if there were several running 
commands when error happened, all other commands are aborted and retried 
after error recovery process completes. I haven't experimented with 
really broken drives, but artificially generated NCQ errors were handled 
properly on my tests.

>  Here is the dmesg with ahci and the jmicron:
> 
> atapci0: <JMicron JMB363 SATA300 controller> port 0xbf00-0xbf07,0xbe00-0xbe03,0xbd00-0xbd07,0xbc00-0xbc03,0xbb00-0xbb0f mem 0xfd8fe000-0xfd8fffff irq 17 at device 0.0 on pci2
> atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xbb00
> ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 0 vector 49
> atapci0: [MPSAFE]
> atapci0: [ITHREAD]
> atapci0: Reserved 0x2000 bytes for rid 0x24 type 3 at 0xfd8fe000
> atapci0: AHCI called from vendor specific driver
> atapci0: AHCI v1.00 controller with 2 3Gbps ports, PM supported
> atapci0: Caps: 64bit NCQ ALP AL CLO 3Gbps PM PMD SSC PSC 32cmd 2ports
> ata2: <ATA channel 0> on atapci0
> ata2: AHCI reset...
> ata2: hardware reset ...
> ata2: SATA connect timeout status=00000000
> ata2: AHCI reset done: phy reset found no device
> ata2: [MPSAFE]
> ata2: [ITHREAD]
> ata3: <ATA channel 1> on atapci0
> ata3: AHCI reset...
> ata3: hardware reset ...
> ata3: SATA connect time=0ms status=00000113
> ata3: ready wait time=11ms
> ata3: software reset port 15...
> ata3: ready wait time=0ms
> ata3: SIGNATURE: eb140101
> ata3: AHCI reset done: devices=00010000
> ata3: [MPSAFE]
> ata3: [ITHREAD]
> ata4: <ATA channel 2> on atapci0
> atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0xbf00
> atapci0: Reserved 0x4 bytes for rid 0x14 type 4 at 0xbe00
> ata4: reset tp1 mask=03 ostat0=60 ostat1=70
> ata4: stat0=0x20 err=0x20 lsb=0x20 msb=0x20
> ata4: stat1=0x30 err=0x30 lsb=0x30 msb=0x30
> ata4: reset tp2 stat0=20 stat1=30 devices=0x0
> ata4: [MPSAFE]
> ata4: [ITHREAD]

As I can see here, your JMicron configured for combined mode, not for 
plain AHCI, so it was handled by ata(4), not by ahci(4).

-- 
Alexander Motin
Received on Sat Jul 25 2009 - 17:19:42 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC