ULE problems on HTT SMP

From: Andrew Gallatin <gallatin_at_cs.duke.edu>
Date: Fri, 27 Jun 2003 10:34:06 -0400 (EDT)
Jeff,

On an "SMP" box I have, which is really a p4 box with one physical
CPU, and 2 HTT cores, I've seen some strange behaviour with ULE.
With ULE enabled,  I've see jobs "wedge" for no apparent reason.
Some examples are fsck, dhclient and gcc.

Here's an example of fsck after it stopped responding:

load: 1.00  cmd: fsck_ufs 46 [physrd] 0.15u 0.31s 1% 1976k
[halt - sent]
Stopped at      siointr1+0xd5:  jmp     siointr1+0x200
db> ps
  pid   proc     addr    uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
   46 c41e05ac d8d9d000    0     1    43 0004002 [SLP]physrd 0xce588fe8] fsck_ufs
   42 c4025b58 d7b66000    0     1    42 0004002 [SLP]biord 0xce58a488] sh
   41 c4025d3c d7b67000    0     0     0 0000204 [SLP]nfsidl 0xc03f9b8c] nfsiod3
   40 c4173000 d7b96000    0     0     0 0000204 [SLP]nfsidl 0xc03f9b88] nfsiod2
   39 c41731e4 d7b97000    0     0     0 0000204 [SLP]nfsidl 0xc03f9b84] nfsiod1
   38 c41733c8 d7b98000    0     0     0 0000204 [SLP]nfsidl 0xc03f9b80] nfsiod0
   37 c41735ac d7b99000    0     0     0 0000204 [SLP]vlruwt 0xc41735ac] vnlru
   36 c4173790 d7b9a000    0     0     0 0000204 [SLP]syncer 0xc03cacc0] syncer
   35 c4173974 d7b9b000    0     0     0 0000204 [SLP]psleep 0xc03f7e3c] bufdaemon
   34 c4173b58 d7b9c000    0     0     0 000020c [SLP]pgzero 0xc03ffc08] pagezero
    9 c4173d3c d7b9d000    0     0     0 0000204 [SLP]psleep 0xc03ffc34] vmdaemon
    8 c4175000 d7b9e000    0     0     0 0000204 [SLP]psleep 0xc03ffc20] pagedaemon
   33 c41751e4 d7b9f000    0     0     0 0000204 new [IWAIT] irq8: rtc
   32 c3f795ac d7b2b000    0     0     0 0000204 new [IWAIT] irq0: clk
   31 c3f79790 d7b2c000    0     0     0 0000204 [IWAIT] irq6: fdc0
   30 c3f79974 d7b2d000    0     0     0 0000204 new [IWAIT] irq7: ppc0
   29 c3f79b58 d7b2e000    0     0     0 0000204 new [IWAIT] irq3: sio1
   28 c3f79d3c d7b2f000    0     0     0 0000204 new [IWAIT] irq4: sio0
   27 c4025000 d7b39000    0     0     0 0000204 [IWAIT] swi0: tty:sio
   26 c40251e4 d7b3a000    0     0     0 0000204 new [IWAIT] irq11: em0
   25 c40253c8 d7b3b000    0     0     0 0000204 [IWAIT] irq15: ata1
   24 c40255ac d7b3c000    0     0     0 0000204 [IWAIT] irq14: ata0
   23 c4025790 d7b3d000    0     0     0 0000204 new [IWAIT] irq5: fxp0
    7 c4025974 d7b3e000    0     0     0 0000204 [SLP]actask 0xc04e40cc] acpi_task2
    6 c150a1e4 d6929000    0     0     0 0000204 [SLP]actask 0xc04e40cc] acpi_task1
    5 c150a3c8 d692a000    0     0     0 0000204 [SLP]actask 0xc04e40cc] acpi_task0
   22 c150a5ac d692b000    0     0     0 0000204 new [IWAIT] irq9: acpi0
   21 c150a790 d692c000    0     0     0 0000204 [IWAIT] swi3: cambio
   20 c150a974 d692d000    0     0     0 0000204 new [IWAIT] swi2: camnet
   19 c150ab58 d692e000    0     0     0 0000204 new [IWAIT] swi5:+
   18 c150ad3c d6956000    0     0     0 0000204 new [IWAIT] swi6: task queue
   17 c3f79000 d7b28000    0     0     0 0000204 [IWAIT] swi6: acpitaskq
   16 c3f791e4 d7b29000    0     0     0 0000204 [SLP]sleep 0xc03b5dc0] random
    4 c3f793c8 d7b2a000    0     0     0 0000204 [RUNQ] g_down
    3 c1503000 d68d2000    0     0     0 0000204 [SLP]- 0xc03c41f8] g_up
    2 c15031e4 d6921000    0     0     0 0000204 [SLP]- 0xc03c41f0] g_event
   15 c15033c8 d6922000    0     0     0 0000204 new [IWAIT] swi4: vm
   14 c15035ac d6923000    0     0     0 000020c [IWAIT] swi7: tty:sio clock
   13 c1503790 d6924000    0     0     0 0000204 new [IWAIT] swi1: net
   12 c1503974 d6925000    0     0     0 000020c [CPU 0] idle: cpu0
   11 c1503b58 d6926000    0     0     0 000020c [CPU 1] idle: cpu1
    1 c1503d3c d6927000    0     0     1 0004200 [SLP]wait 0xc1503d3c] init
   10 c150a000 d6928000    0     0     0 0000204 [CV]ktrace 0xc03c7794] ktrace
    0 c03c42c0 c0513000    0     0     0 0000200 [SLP]sched 0xc03c42c0] swapper
db> c

load: 1.00  cmd: fsck_ufs 46 [physrd] 0.15u 0.31s 1% 1976k
load: 1.00  cmd: fsck_ufs 46 [physrd] 0.15u 0.31s 1% 1976k

At this point, fsck never makes any progress, and is unkillable with ^C. 


This is a kernel from Tuesday's sources.  The last time I updated the
machine was early April, and all worked fine then..

Drew
Received on Fri Jun 27 2003 - 05:34:14 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:13 UTC