Re: RFC: ATA to CAM integration patch (and gjournaled previuos nodes)

From: Alexander Motin <mav_at_FreeBSD.org>
Date: Sat, 25 Jul 2009 23:25:56 +0300
Juergen Lock wrote:
> On Sat, Jul 25, 2009 at 10:19:10PM +0300, Alexander Motin wrote:
>> Juergen Lock wrote:
>>> On Mon, Jul 06, 2009 at 11:16:46PM +0200, Juergen Lock wrote:
>>>> I tried this on the box with that optical drive that head no
>>>> longer likes (fails to be probed and generates an irq storm, see
>>>> 	http://docs.freebsd.org/cgi/mid.cgi?20090628101656.GA38983
>>>> ), and with ahci.ko loaded by loader.conf I got timeouts followed by
>>>> a panic:
>>>> 	http://people.freebsd.org/~nox/cam-ata.20090704-panic1.jpg
>>>> 	http://people.freebsd.org/~nox/cam-ata.20090704-panic2.jpg
>>>> [...]
>>> Ok I managed to dig myself out of this mess by connecting the problem
>>> drive to a jmicron pcie card that fell into my hands yesterday; I updated
>>> the test install to head from today and started reinstalling ports (bc of
>>> the shlib bumps) and testing the new hplip port on head (seems to work
>>> no worse than on 7), when suddenly ahci got problems: it printed endless
>>> retrying messages with the box' disk access led on solid, causing processes
>>> to get stuck.  I was still able to switch to a console and enter ddb,
>>> but dumping (call doadump) failed and I didn't know what to look for
>>> otherwise, so I'm afraid I can't give more info about this hang. :(
>>> Anyway, could this be caused by ncq?  I have disabled ahci.ko for now,
>>> we'll see if this `fixes' it...
>> Difficult to say without seeing those messages. NCQ errors actually may 
>> lead to series (up to 32) of retries, as if there were several running 
>> commands when error happened, all other commands are aborted and retried 
>> after error recovery process completes.
> 
> Ah so the recovery could take several minutes?  Maybe I didn't wait
> long enough then...

Depends on number of errors. It should be incredibly bad case I think.

>>  I haven't experimented with 
>> really broken drives, but artificially generated NCQ errors were handled 
>> properly on my tests.
>>
>  OK I guess I should take a photo next time it happens...  Btw, can the
> max # of `tags' be lowered with ncq too in case a drive cant handle
> too many?  I think its `camcontrol tags' for scsi...

To allow some simplifications, current implementation supports NCQ in 
all-or-none fashion. If drive reports queue support of less then 32 
commands, then NCQ will not be used for it. It is not controllable via 
`camcontrol tags` now, due to major difference between SATA NCQ and SCSI 
TCQ operation principles.

>>>  Here is the dmesg with ahci and the jmicron:
>>>
>>> atapci0: <JMicron JMB363 SATA300 controller> port 0xbf00-0xbf07,0xbe00-0xbe03,0xbd00-0xbd07,0xbc00-0xbc03,0xbb00-0xbb0f mem 0xfd8fe000-0xfd8fffff irq 17 at device 0.0 on pci2
>>> atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xbb00
>>> ioapic0: routing intpin 17 (PCI IRQ 17) to lapic 0 vector 49
>>> atapci0: [MPSAFE]
>>> atapci0: [ITHREAD]
>>> atapci0: Reserved 0x2000 bytes for rid 0x24 type 3 at 0xfd8fe000
>>> atapci0: AHCI called from vendor specific driver
>>> atapci0: AHCI v1.00 controller with 2 3Gbps ports, PM supported
>>> atapci0: Caps: 64bit NCQ ALP AL CLO 3Gbps PM PMD SSC PSC 32cmd 2ports
>>> ata2: <ATA channel 0> on atapci0
>>> ata2: AHCI reset...
>>> ata2: hardware reset ...
>>> ata2: SATA connect timeout status=00000000
>>> ata2: AHCI reset done: phy reset found no device
>>> ata2: [MPSAFE]
>>> ata2: [ITHREAD]
>>> ata3: <ATA channel 1> on atapci0
>>> ata3: AHCI reset...
>>> ata3: hardware reset ...
>>> ata3: SATA connect time=0ms status=00000113
>>> ata3: ready wait time=11ms
>>> ata3: software reset port 15...
>>> ata3: ready wait time=0ms
>>> ata3: SIGNATURE: eb140101
>>> ata3: AHCI reset done: devices=00010000
>>> ata3: [MPSAFE]
>>> ata3: [ITHREAD]
>>> ata4: <ATA channel 2> on atapci0
>>> atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0xbf00
>>> atapci0: Reserved 0x4 bytes for rid 0x14 type 4 at 0xbe00
>>> ata4: reset tp1 mask=03 ostat0=60 ostat1=70
>>> ata4: stat0=0x20 err=0x20 lsb=0x20 msb=0x20
>>> ata4: stat1=0x30 err=0x30 lsb=0x30 msb=0x30
>>> ata4: reset tp2 stat0=20 stat1=30 devices=0x0
>>> ata4: [MPSAFE]
>>> ata4: [ITHREAD]
>> As I can see here, your JMicron configured for combined mode, not for 
>> plain AHCI, so it was handled by ata(4), not by ahci(4).
> 
>  Ah that can be configured?  Anyway there's only an optical drive on
> it atm so its probably not _that_ important. :)

On my system it can be configured via BIOS settings.

-- 
Alexander Motin
Received on Sat Jul 25 2009 - 18:26:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:52 UTC