ata irq storm/probe failure; tdq_notify pagefault (was: pmap_extract panic @boot (20090623-JPSNAP-amd64-dvd1.iso))

From: Juergen Lock <nox_at_jelal.kn-bremen.de>
Date: Sat, 27 Jun 2009 00:26:29 +0200
On Thu, Jun 25, 2009 at 10:25:36PM +0200, Juergen Lock wrote:
> I wanted to try a head snapshot iso today and it failed to boot:
> 	http://people.freebsd.org/~nox/20090623-JPSNAP-amd64-bootpanic1.jpg
> 	http://people.freebsd.org/~nox/20090623-JPSNAP-amd64-bootpanic2.jpg
> (I tried to get a serial console but couldn't get this box' serial
> port working for some reason... ):
> 
>  As you can see in the jpegs it panic'd at pmap_extract+0x106, which
> I _think_ is in line 963: (unfortunately there's no kernel.symbols on
> the iso)
> 	pdep = pmap_pde(pmap, va);
> (see
> 	http://fxr.watson.org/fxr/source/amd64/amd64/pmap.c#L963
> ), and that got invoked from isa_dmarangecheck+0x98:
> 	http://fxr.watson.org/fxr/source/amd64/isa/isa_dma.c#L374
> [...]

After I realized ppc_isa_attach() belongs to the parallel port driver
which is unused on this box I was able to work around this issue by
disabling it in the bios.  But the dmesg I posted already points at
the next problem...

> [...]
> start_init: trying /sbin/init
> start_init: trying /sbin/oinit
> start_init: trying /sbin/init.bak
> start_init: trying /rescue/init
> start_init: trying /stand/sysinstall
> ata5: CONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> ata5: SATA connect timeout status=00000001
> ata5: AHCI reset done: phy reset found no device
> ata5: reinit done ..
> ata5: DISCONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> ata5: SATA connect timeout status=00000001
> ata5: AHCI reset done: phy reset found no device
> ata5: reinit done ..
> ata5: CONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> ata5: SATA connect timeout status=00000001
> ata5: AHCI reset done: phy reset found no device
> ata5: reinit done ..
> ata5: DISCONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> ata5: SATA connect timeout status=00000000
> ata5: AHCI reset done: phy reset found no device
> ata5: reinit done ..
> ata5: CONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> ata5: SATA connect timeout status=00000000
> ata5: AHCI reset done: phy reset found no device
> ata5: reinit done ..
> ata5: DISCONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> ata5: SATA connect timeout status=00000001
> ata5: AHCI reset done: phy reset found no device
> ata5: reinit done ..
> ata5: CONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> ata5: SATA connect timeout status=00000001
> ata5: AHCI reset done: phy reset found no device
> ata5: reinit done ..
> ata5: DISCONNECT requested
> ata5: reiniting channel ..
> ata5: AHCI reset...
> ata5: hardware reset ...
> [...]

i.e. ata5 isnt probed correctly anymore (its an optical drive), and
meanwhile I discovered there's an irq storm on irq 22 too (shared between
atapci0 and fwohci0), causing one cpu to be at near 100%.  This doesn't
seem to occur with a 20090315 snapshot so _maybe_ its related to this
commit:

mav         2009-03-30 22:18:38 UTC

  FreeBSD src repository

  Modified files:
    sys/dev/ata          ata-pci.c ata-pci.h ata-sata.c 
    sys/dev/ata/chipsets ata-ahci.c ata-intel.c ata-jmicron.c 
                         ata-marvell.c ata-nvidia.c ata-promise.c 
                         ata-siliconimage.c ata-sis.c ata-via.c 
  Log:
  SVN rev 190581 on 2009-03-30 22:18:38Z by mav
  
  Integrate user/mav/ata branch:
  
  Add ch_suspend/ch_resume methods for PCI controllers and implement them
  for AHCI. Refactor AHCI channel initialization according to it.
  
  Fix Port Multipliers operation. It is far from perfect yet, but works now.
  Tested with JMicron JMB363 AHCI + SiI 3726 PMP pair.
  Previous version was also tested with SiI 4726 PMP.
  
  Hardware sponsored by: Vitsch Electronics / VEHosting.nl

(I first tried manually reverting commit 191568 in the 20090623 kernel
as this box has an amd/ati sb600, but that seemed to make no difference.)

 While doing that I discovered another problem tho: I installed onto an
usb key for testing, and booting that panic'd in tdq_notify+0x4e with
a pagefault due to a null pointer here:
	movzbl 0x27a(%rax),%edx
(it was trying to read from 0x27a; the backtrace was almost empty as
this seems to be part of the scheduler - I can post the backtrace too
if anyone's interested.)

 I then got it to boot into single user, and from there got into multiuser,
and then discovered the ata issue...

 The good news is the new usb stack seems to like my printer better!
(I didn't actually test printing itself but at least it got detected
at boot, which on 7-stable it doesn't, I always have to plug it into
another port atm after boot before I can use it...)

 Ok that should be enough for now.  Cheers, :)
	Juergen

PS: ata5 related parts of stable dmesg:

...
ata5: <ATA channel 3> on atapci0
ata5: SATA connect time=0ms
ata5: SIGNATURE: eb140101
ata5: ahci_reset devices=0x4<ATAPI_MASTER>
ata5: [MPSAFE]
ata5: [ITHREAD]
...
ata5-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=40 wire
ata5: device_reset timeout=660us
acd1: <ATAPI BD B DH4B1S/7P5B> DVDR drive at ata5 as master
acd1: read 5512KB/s (5512KB/s) write 5512KB/s (5512KB/s), 16384KB buffer, SATA150
acd1: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, packet
acd1: Writes: CDR, CDRW, DVDR, test write, burnproof
acd1: Audio: play, 2 volume levels
acd1: Mechanism: ejectable tray, unlocked
acd1: Medium: no/blank disc
...

 ...and the other optical drive that also gets detected on head (this
dmesg still from stable:)

...
ata3: <ATA channel 1> on atapci0
ata3: SATA connect time=0ms
ata3: SIGNATURE: eb140101
ata3: ahci_reset devices=0x4<ATAPI_MASTER>
ata3: [MPSAFE]
ata3: [ITHREAD]
...
ata3-master: pio=PIO4 wdma=WDMA2 udma=UDMA66 cable=40 wire
ata3: device_reset timeout=120us
acd0: <PIONEER DVD-RW DVR-216D/1.09> DVDR drive at ata3 as master
acd0: read 6890KB/s (6890KB/s) write 5511KB/s (6890KB/s), 2000KB buffer, SATA150
acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, DVDRAM, packet
acd0: Writes: CDR, CDRW, DVDR, test write, burnproof
acd0: Audio: play, 256 volume levels
acd0: Mechanism: ejectable tray, unlocked
acd0: Medium: no/blank disc
...

 ..so maybe (wild guess!) UDMA100 is related?  Here is the head dmesg
related to that drive for comparison:

...
ata3: <ATA channel 1> on atapci0
ata3: AHCI reset...
ata3: hardware reset ...
ata3: SATA connect time=10ms status=00000113
ata3: ready wait time=183ms
ata3: software reset port 15...
ata3: ahci_issue_cmd timeout: 3000 of 3000ms, status=00000001
ata3: port is not ready (timeout 0ms) tfd = 00000180
ata3: software reset clear timeout
ata3: software reset port 0...
ata3: ready wait time=0ms
ata3: SIGNATURE: eb140101
ata3: AHCI reset done: devices=00010000
ata3: [MPSAFE]
ata3: [ITHREAD]
...
ata3: Identifying devices: 00010000
ata3: New devices: 00010000
ata3-master: pio=PIO4 wdma=WDMA2 udma=UDMA66 cable=40 wire
ata3: device_reset timeout=120us
uhub3: 6 ports with 6 removable, self powered
uhub6: 6 ports with 6 removable, self powered
acd0: <PIONEER DVD-RW DVR-216D/1.09> DVDR drive at ata3 as master
acd0: read 17583KB/s (17583KB/s) write 5410KB/s (5410KB/s), 2000KB buffer,
+SATA150
acd0: Reads: CDR, CDRW, CDDA stream, DVDROM, DVDR, DVDRAM, packet
acd0: Writes: CDR, CDRW, DVDR, test write, burnproof
acd0: Audio: play, 256 volume levels
acd0: Mechanism: ejectable tray, unlocked
acd0: Medium: DVD 120mm data disc
Received on Fri Jun 26 2009 - 20:32:57 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:50 UTC