Re: RFC: ATA to CAM integration patch (and gjournaled previuos nodes)

From: Juergen Lock <nox_at_jelal.kn-bremen.de> Date: Sat, 25 Jul 2009 22:11:14 +0200 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC

On Sat, Jul 25, 2009 at 10:19:10PM +0300, Alexander Motin wrote:
> Juergen Lock wrote:
> > On Mon, Jul 06, 2009 at 11:16:46PM +0200, Juergen Lock wrote:
> >> I tried this on the box with that optical drive that head no
> >> longer likes (fails to be probed and generates an irq storm, see
> >> 	http://docs.freebsd.org/cgi/mid.cgi?20090628101656.GA38983
> >> ), and with ahci.ko loaded by loader.conf I got timeouts followed by
> >> a panic:
> >> 	http://people.freebsd.org/~nox/cam-ata.20090704-panic1.jpg
> >> 	http://people.freebsd.org/~nox/cam-ata.20090704-panic2.jpg
> >> [...]
> > 
> > Ok I managed to dig myself out of this mess by connecting the problem
> > drive to a jmicron pcie card that fell into my hands yesterday; I updated
> > the test install to head from today and started reinstalling ports (bc of
> > the shlib bumps) and testing the new hplip port on head (seems to work
> > no worse than on 7), when suddenly ahci got problems: it printed endless
> > retrying messages with the box' disk access led on solid, causing processes
> > to get stuck.  I was still able to switch to a console and enter ddb,
> > but dumping (call doadump) failed and I didn't know what to look for
> > otherwise, so I'm afraid I can't give more info about this hang. :(
> > Anyway, could this be caused by ncq?  I have disabled ahci.ko for now,
> > we'll see if this `fixes' it...
> 
> Difficult to say without seeing those messages. NCQ errors actually may 
> lead to series (up to 32) of retries, as if there were several running 
> commands when error happened, all other commands are aborted and retried 
> after error recovery process completes.

Ah so the recovery could take several minutes?  Maybe I didn't wait
long enough then...

>  I haven't experimented with 
> really broken drives, but artificially generated NCQ errors were handled 
> properly on my tests.
> 
 OK I guess I should take a photo next time it happens...  Btw, can the
max # of `tags' be lowered with ncq too in case a drive cant handle
too many?  I think its `camcontrol tags' for scsi...

> >  Here is the dmesg with ahci and the jmicron:
> > 
> > atapci0: <JMicron JMB363 SATA300 controller> port 0xbf00-0xbf07,0xbe00-0xbe03,0xbd00-0xbd07,0xbc00-0xbc03,0xbb00-0xbb0f mem 0xfd8fe000-0xfd8fffff irq 17 at device 0.0 on pci2
> > atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xbb00
> > ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 0 vector 49
> > atapci0: [MPSAFE]
> > atapci0: [ITHREAD]
> > atapci0: Reserved 0x2000 bytes for rid 0x24 type 3 at 0xfd8fe000
> > atapci0: AHCI called from vendor specific driver
> > atapci0: AHCI v1.00 controller with 2 3Gbps ports, PM supported
> > atapci0: Caps: 64bit NCQ ALP AL CLO 3Gbps PM PMD SSC PSC 32cmd 2ports
> > ata2: <ATA channel 0> on atapci0
> > ata2: AHCI reset...
> > ata2: hardware reset ...
> > ata2: SATA connect timeout status=00000000
> > ata2: AHCI reset done: phy reset found no device
> > ata2: [MPSAFE]
> > ata2: [ITHREAD]
> > ata3: <ATA channel 1> on atapci0
> > ata3: AHCI reset...
> > ata3: hardware reset ...
> > ata3: SATA connect time=0ms status=00000113
> > ata3: ready wait time=11ms
> > ata3: software reset port 15...
> > ata3: ready wait time=0ms
> > ata3: SIGNATURE: eb140101
> > ata3: AHCI reset done: devices=00010000
> > ata3: [MPSAFE]
> > ata3: [ITHREAD]
> > ata4: <ATA channel 2> on atapci0
> > atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0xbf00
> > atapci0: Reserved 0x4 bytes for rid 0x14 type 4 at 0xbe00
> > ata4: reset tp1 mask=03 ostat0=60 ostat1=70
> > ata4: stat0=0x20 err=0x20 lsb=0x20 msb=0x20
> > ata4: stat1=0x30 err=0x30 lsb=0x30 msb=0x30
> > ata4: reset tp2 stat0=20 stat1=30 devices=0x0
> > ata4: [MPSAFE]
> > ata4: [ITHREAD]
> 
> As I can see here, your JMicron configured for combined mode, not for 
> plain AHCI, so it was handled by ata(4), not by ahci(4).

 Ah that can be configured?  Anyway there's only an optical drive on
it atm so its probably not _that_ important. :)

 Thanx,
	Juergen