ULE crash

From: Ian Freislich <ianf_at_za.uu.net>
Date: Wed, 25 Jun 2003 18:20:33 +0200
Hi

About 4.5 minutes after rebooting with a SCHED_ULE kernel (I give
ULE a go every few months), top started looking really wierd (the
CPU % just kept on accumulating for each process). Before dnetc
started, httpd showed 17% CPU, but the system was supposedly 100%
idle at the time according to top.  Then dnetc started and things
got wierd.

last pid:   607;  load averages:  1.83,  0.63,  0.25    up 0+00:04:23  16:00:48
35 processes:  3 running, 32 sleeping
CPU states:  0.0% user, 99.0% nice,  0.6% system,  0.4% interrupt,  0.0% idle
Mem: 20M Active, 14M Inact, 19M Wired, 20K Cache, 25M Buf, 130M Free
Swap: 512M Total, 512M Free

  PID USERNAME  PRI NICE   SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
  603 ianf      139   20  1072K   880K RUN    0   0:39 105.47% 105.47% dnetc
  575 ianf      139   20  1072K   880K CPU1   1   1:15 102.34% 102.34% dnetc
  505 root       76    0  7208K  5420K select 0   0:01 17.97% 17.97% httpd
  375 root        4    0  1276K   948K accept 0   0:00  9.38%  9.38% nfsd
  526 nobody     76    0  9280K  8564K select 1   0:04  5.47%  5.47% squid
  607 ianf       76    0  2196K  1444K CPU0   0   0:00  2.34%  2.34% top

Then it froze.  When I got home I found that it had at least dumped
vmcore.24.  I'll keep it around for a while and perform any inspections
people want me to.  This was with sources updated at 13h30 GMT today.

panic: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
cpuid = 1; lapic.id = 01000000
fault virtual address   = 0x38
fault code              = supervisor read, page not present
instruction pointer     = 0x8:0xc01e094d
stack pointer           = 0x10:0xce772be4
frame pointer           = 0x10:0xce772bf4
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 603 (dnetc)
trap number             = 12
panic: page fault
cpuid = 1; lapic.id = 01000000
Stack backtrace:
boot() called on cpu#1

syncing disks, buffers remaining... panic: absolutely cannot call smp_ipi_shootdown with interrupts already disabled
cpuid = 1; lapic.id = 01000000
boot() called on cpu#1
Uptime: 4m15s
Dumping 191 MB
ata0: resetting devices ..
done
 16 32 48 64 80 96 112 128 144 160 176
---

(kgdb) bt
#0  doadump () at ../../../kern/kern_shutdown.c:240
#1  0xc01cbe7f in boot (howto=260) at ../../../kern/kern_shutdown.c:372
#2  0xc01cc2b8 in panic () at ../../../kern/kern_shutdown.c:550
#3  0xc02e8f89 in smp_tlb_shootdown (vector=0, addr1=0, addr2=0)
    at ../../../i386/i386/mp_machdep.c:2356
#4  0xc02e92a9 in smp_invlpg_range (addr1=0, addr2=0)
    at ../../../i386/i386/mp_machdep.c:2488
#5  0xc02eb548 in pmap_invalidate_range (pmap=0xc03996e0, sva=3365310464, 
    eva=3365314560) at ../../../i386/i386/pmap.c:721
#6  0xc02eb83d in pmap_qenter (sva=3365310464, m=0xce772884, count=0)
    at ../../../i386/i386/pmap.c:948
#7  0xc0218a31 in vm_hold_load_pages (bp=0xc76039a0, from=0, to=3365318656)
    at ../../../kern/vfs_bio.c:3574
#8  0xc0216f5a in allocbuf (bp=0xc76039a0, size=8192)
    at ../../../kern/vfs_bio.c:2752
#9  0xc0216cee in geteblk (size=8192) at ../../../kern/vfs_bio.c:2634
#10 0xc0213980 in bwrite (bp=0xc75b65d8) at ../../../kern/vfs_bio.c:818
#11 0xc02142dc in bawrite (bp=0x0) at ../../../kern/vfs_bio.c:1153
#12 0xc021d89a in vop_stdfsync (ap=0xce772a14)
    at ../../../kern/vfs_default.c:742
#13 0xc0193570 in spec_fsync (ap=0xce772a14)
    at ../../../fs/specfs/spec_vnops.c:417
#14 0xc0192a38 in spec_vnoperate (ap=0x0)
    at ../../../fs/specfs/spec_vnops.c:122
#15 0xc0294c62 in ffs_sync (mp=0xc3950a00, waitfor=2, cred=0xc0d06e80, 
    td=0xc03702a0) at vnode_if.h:624
#16 0xc022b15b in sync (td=0xc03702a0, uap=0x0)
    at ../../../kern/vfs_syscalls.c:142
#17 0xc01cb9a1 in boot (howto=256) at ../../../kern/kern_shutdown.c:281
#18 0xc01cc2b8 in panic () at ../../../kern/kern_shutdown.c:550
#19 0xc02f0da2 in trap_fatal (frame=0xce772ba4, eva=0)
    at ../../../i386/i386/trap.c:836
#20 0xc02f0333 in trap (frame=
      {tf_fs = -1060044776, tf_es = -831062000, tf_ds = -1071775728, tf_edi = -1014422336, tf_esi = -1070107520, tf_ebp = -831050764, tf_isp = -831050800, tf_ebx = 0, tf_edx = 0, tf_ecx = -1059988168, tf_eax = 0, tf_trapno = 12, tf_err = 0, tf_eip = -1071773363, tf_cs = 8, tf_eflags = 66194, tf_esp = -1070107520, tf_ss = 0}) at ../../../i386/i386/trap.c:256
#21 0xc02d8eb8 in calltrap () at {standard input}:97
#22 0xc01e188b in sched_choose () at ../../../kern/sched_ule.c:1161
#23 0xc01d25e6 in choosethread () at ../../../kern/kern_switch.c:140
#24 0xc01d422f in mi_switch () at ../../../kern/kern_synch.c:525
#25 0xc01c1db6 in _mtx_lock_sleep (m=0xc0374a40, opts=0, file=0x0, line=0)
    at ../../../kern/kern_mutex.c:636
#26 0xc01ca585 in getrusage (td=0x0, uap=0xce772d10)
    at ../../../kern/kern_resource.c:773
#27 0xc02f10fc in syscall (frame=
      {tf_fs = 47, tf_es = 47, tf_ds = 47, tf_edi = 135360172, tf_esi = 135336096, tf_ebp = -1077938416, tf_isp = -831050380, tf_ebx = -1077938416, tf_edx = 0, tf_ecx = 0, tf_eax = 117, tf_trapno = 0, tf_err = 2, tf_eip = 134789976, tf_cs = 31, tf_eflags = 659, tf_esp = -1077938572, tf_ss = 47})
    at ../../../i386/i386/trap.c:1023
#28 0xc02d8f0d in Xint0x80_syscall () at {standard input}:139
---Can't read userspace from dump, or kernel process---
Received on Wed Jun 25 2003 - 07:20:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:13 UTC