2010/5/9 Jeff Roberson <jroberson_at_jroberson.net>: > On Sat, 8 May 2010, Ulrich Sp?rlein wrote: > >> On Sat, 08.05.2010 at 18:00:50 +0200, Attilio Rao wrote: >>> >>> 2010/5/8 Ulrich Sp?rlein <uqs_at_spoerlein.net>: >>>> >>>> On Sat, 08.05.2010 at 12:20:05 +0200, Ulrich Sp?rlein wrote: >>>>> >>>>> This LOR also is not yet listed on the LOR page, so I guess it's rather >>>>> new. I do use SUJ. >>>>> >>>>> lock order reversal: >>>>> 1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502 >>>>> 2nd 0xec0fe304 bufwait (bufwait) @ >>>>> /usr/src/sys/ufs/ffs/ffs_softdep.c:11363 >>>>> 3rd 0xc49e56b8 ufs (ufs) _at_ /usr/src/sys/kern/vfs_subr.c:2091 >>>>> KDB: stack backtrace: >>>>> db_trace_self_wrapper(c09394fe,fb817308,c062e515,c061e8ab,c093c4f1,...) >>>>> at db_trace_self_wrapper+0x26 >>>>> kdb_backtrace(c061e8ab,c093c4f1,c418b168,c418ef28,fb817364,...) at >>>>> kdb_backtrace+0x29 >>>>> _witness_debugger(c093c4f1,c49e56b8,c092e785,c418ef28,c094369d,...) at >>>>> _witness_debugger+0x25 >>>>> witness_checkorder(c49e56b8,9,c094369d,82b,0,...) at >>>>> witness_checkorder+0x839 >>>>> __lockmgr_args(c49e56b8,80100,c49e56d8,0,0,...) at __lockmgr_args+0x7f9 >>>>> ffs_lock(fb817488,c062e2bb,c0942b3f,80100,c49e5660,...) at >>>>> ffs_lock+0x82 >>>>> VOP_LOCK1_APV(c09bd600,fb817488,c4827cd4,c09d62a0,c49e5660,...) at >>>>> VOP_LOCK1_APV+0xb5 >>>>> _vn_lock(c49e5660,80100,c094369d,82b,4,...) at _vn_lock+0x5e >>>>> vget(c49e5660,80100,c4827c30,50,0,...) at vget+0xb9 >>>>> vfs_hash_get(c47bea20,b803,80000,c4827c30,fb8175d8,...) at >>>>> vfs_hash_get+0xe6 >>>>> ffs_vgetf(c47bea20,b803,80000,fb8175d8,1,...) at ffs_vgetf+0x49 >>>>> softdep_sync_metadata(c4838880,0,c0962957,144,0,...) at >>>>> softdep_sync_metadata+0xc82 >>>>> ffs_syncvnode(c4838880,1,c4827c30,fb817698,246,...) at >>>>> ffs_syncvnode+0x3e2 >>>>> ffs_truncate(c4838880,200,0,880,c41fb480,...) at ffs_truncate+0x862 >>>>> ufs_direnter(c4838880,c49e5660,fb81794c,fb817bd4,0,...) at >>>>> ufs_direnter+0x8d4 >>>>> ufs_makeinode(fb817bd4,0,fb817b30,fb817a94,c08e4cf5,...) at >>>>> ufs_makeinode+0x517 >>>>> ufs_create(fb817b30,fb817b48,0,0,fb817ba8,...) at ufs_create+0x30 >>>>> VOP_CREATE_APV(c09bd600,fb817b30,2,fb817ac0,0,...) at >>>>> VOP_CREATE_APV+0xa5 >>>>> vn_open_cred(fb817ba8,fb817c5c,1a4,0,c41fb480,...) at >>>>> vn_open_cred+0x1de >>>>> vn_open(fb817ba8,fb817c5c,1a4,c47e2428,0,...) at vn_open+0x3b >>>>> kern_openat(c4827c30,ffffff9c,804c5e8,0,602,...) at kern_openat+0x125 >>>>> kern_open(c4827c30,804c5e8,0,601,21b6,...) at kern_open+0x35 >>>>> open(c4827c30,fb817cf8,c0972725,c091f062,c47ea2a8,...) at open+0x30 >>>>> syscall(fb817d38) at syscall+0x220 >>>>> Xint0x80_syscall() at Xint0x80_syscall+0x20 >>>>> --- syscall (5, FreeBSD ELF32, open), eip = 0x2817bf33, esp = >>>>> 0xbfbfec4c, ebp = 0xbfbfecb8 --- >>>> >>>> And now the system is hanging again. While I can still ping and receive >>>> dmesg updates (eg. USB ports appearing), I/O is frozen solid. This is >>>> during portupgrade, when the configure script runs and usually takes 1-2 >>>> minutes to provoke. >>>> >>>> This part looks supsicious to me: >>>> >>>> db> show alllocks >>>> Process 28014 (mkdir) thread 0xc691ac30 (100152) >>>> exclusive lockmgr bufwait (bufwait) r = 0 (0xec2bdaf0) locked @ >>>> /usr/src/sys/ufs/ffs/ffs_softdep.c:10684 >>>> exclusive lockmgr ufs (ufs) r = 0 (0xc6bcd5a8) locked @ >>>> /usr/src/sys/kern/vfs_subr.c:2091 >>>> exclusive lockmgr bufwait (bufwait) r = 0 (0xec2983f4) locked @ >>>> /usr/src/sys/ufs/ffs/ffs_softdep.c:11363 >>>> exclusive lockmgr ufs (ufs) r = 0 (0xc6d976b8) locked @ >>>> /usr/src/sys/kern/vfs_lookup.c:502 >>>> Process 1990 (sshd) thread 0xc5462750 (100117) >>>> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc546e08c) locked @ >>>> /usr/src/sys/kern/uipc_sockbuf.c:148 >>>> Process 12 (intr) thread 0xc41f4750 (100004) >>>> exclusive sleep mutex ttymtx (ttymtx) r = 0 (0xc425ae04) locked @ >>>> /usr/src/sys/dev/dcons/dcons_os.c:232 >>>> db> >>> >>> Along with show alllocks may you also get the following from DDB: >>> ps, show pcpu, alltrace, lockedvnods. >> >> 1. a kernel before SUJ went in is running fine with SU only >> 2. the following is on a recent -CURRENT that has SUJ, *but* i've >> disabled it, so it is running with soft-updates only (I hope) >> >> I ran a portupgrade and the first configure script triggered the I/O >> hang >> >> db> ps >> pid ppid pgrp uid state wmesg wchan cmd >> 13467 13444 12937 0 R+ mkdir >> 13444 13204 12937 0 S+ wait 0xc54352a8 sh >> 13204 13035 12937 0 S+ wait 0xc5436000 sh >> 13035 12937 12937 0 S+ wait 0xc4ffad48 sh >> 12937 12936 12937 0 Ss+ wait 0xc4ff9d48 make >> 12936 3722 3722 0 R+ script >> 3722 2021 3722 0 S+ (threaded) ruby18 >> 100132 S wait 0xc4ffa7f8 ruby18 >> 2404 2007 2404 1000 Ss+ ttyin 0xc4d74870 zsh >> 2325 2015 2325 1000 R+ top >> 2021 2009 2021 0 S+ pause 0xc4ff9058 csh >> 2015 2007 2015 1000 Ss+ pause 0xc4ffa058 zsh >> 2009 2007 2009 1000 Ss+ pause 0xc4d4e850 zsh >> 2007 2006 2007 1000 Rs screen >> 2006 1991 2006 1000 R+ screen >> 2005 2001 2005 0 R+ systat >> 2001 1976 2001 0 S+ pause 0xc3d52058 csh >> 2000 1 2000 0 Ss select 0xc3d5b1a4 ssh-agent >> 1991 1990 1991 1000 Ss+ pause 0xc3d52850 zsh >> 1990 1986 1986 1000 S select 0xc3d5aca4 sshd >> 1989 1 1989 1000 Ss select 0xc3d5ae24 ssh-agent >> 1986 1884 1986 0 Ss sbwait 0xc4f2058c sshd >> 1985 1 1985 0 Ss+ ttyin 0xc3e62670 getty >> 1984 1 1984 0 Ss+ ttyin 0xc3e61c70 getty >> 1983 1 1983 0 Ss+ ttyin 0xc3e60070 getty >> 1982 1 1982 0 Ss+ ttyin 0xc3e60a70 getty >> 1981 1 1981 0 Ss+ ttyin 0xc3da7a70 getty >> 1980 1 1980 0 Ss+ ttyin 0xc3da7c70 getty >> 1979 1 1979 0 Ss+ ttyin 0xc3e60c70 getty >> 1976 1 1976 0 Ss+ wait 0xc4c392a8 login >> 1955 1 1955 0 Rs bsnmpd >> 1931 1 1931 0 Ss select 0xc4d0f164 inetd >> 1890 1 1890 0 Rs cron >> 1884 1 1884 0 Ss select 0xc4d0eda4 sshd >> 1808 1 1808 0 Rs sendmail >> 1764 1 1764 0 Rs moused >> 1754 1 1754 0 Ss (threaded) nslcd >> 100111 S accept 0xc4d076ae nslcd >> 100110 S accept 0xc4d076ae nslcd >> 100109 S accept 0xc4d076ae nslcd >> 100108 S accept 0xc4d076ae nslcd >> 100107 S accept 0xc4d076ae nslcd >> 100076 S uwait 0xc4493180 nslcd >> 1748 1 1748 0 Rs openvpn >> 1724 1 1724 201 Ss accept 0xc4d07b82 privoxy >> 1672 1 1672 0 Rs powerd >> 1664 1 1664 0 Rs ntpd >> 1658 1 1658 0 Ss (threaded) nscd >> 100094 S kqread 0xc4cf1700 nscd >> 100093 S kqread 0xc4cf1700 nscd >> 100092 S kqread 0xc4cf1700 nscd >> 100091 S kqread 0xc4cf1700 nscd >> 100090 S kqread 0xc4cf1700 nscd >> 100089 S kqread 0xc4cf1700 nscd >> 100088 S kqread 0xc4cf1700 nscd >> 100087 S kqread 0xc4cf1700 nscd >> 100075 S uwait 0xc449f680 nscd >> 1576 1 1576 389 Ss (threaded) slapd >> 100115 S ucond 0xc4cf2700 slapd >> 100086 S ucond 0xc4cf2600 slapd >> 100085 S select 0xc4d0eea4 slapd >> 100073 S uwait 0xc42a9580 slapd >> 1563 1 1562 0 S nanslp 0xc09f3864 smartd >> 1526 1 1526 0 Rs rpc.statd >> 1445 1 1445 0 Rs amd >> 1418 1 1418 0 Rs rpcbind >> 1403 1 1403 53 Ss (threaded) named >> 100084 S kqread 0xc4cdcb80 named >> 100083 S ucond 0xc4260b00 named >> 100082 S ucond 0xc449fb80 named >> 100074 S sigwait 0xfb4b8be0 named >> 1306 1 1306 0 Rs syslogd >> 997 0 0 0 RL [pfpurge] >> 973 1 973 0 Ss select 0xc41eb0a4 devd >> 964 1 964 65 Ss select 0xc44ac524 dhclient >> 942 1 942 0 Ss select 0xc426f424 dhclient >> 856 1 856 0 Ss select 0xc41eb424 moused >> 561 1 561 0 Ss select 0xc44acde4 wpa_supplicant >> 150 1 150 0 Ss pause 0xc42ec850 adjkerntz >> 95 0 0 0 DL mdwait 0xc426a000 [md0] >> 74 0 0 0 DL geli:w 0xc4265400 [g_eli[0] ad0s4f] >> 22 0 0 0 RL [softdepflush] >> 21 0 0 0 RL [syncer] >> 20 0 0 0 RL [vnlru] >> 19 0 0 0 RL [bufdaemon] >> 18 0 0 0 RL [pagezero] >> 17 0 0 0 DL psleep 0xc0b769d8 [vmdaemon] >> 16 0 0 0 RL [pagedaemon] >> 9 0 0 0 DL ccb_scan 0xc09d78d4 [xpt_thrd] >> 15 0 0 0 DL tzpoll 0xc09dafcc [acpi_thermal] >> 8 0 0 0 SL - 0xc3f55000 [fw0_probe] >> 7 0 0 0 DL - 0xc3ebd3bc [cbb0 event thread] >> 14 0 0 0 DL (threaded) [usb] >> 100040 D - 0xc3f39d0c [usbus3] >> 100039 D - 0xc3f39cdc [usbus3] >> 100038 D - 0xc3f39cac [usbus3] >> 100037 D - 0xc3f39c7c [usbus3] >> 100036 D - 0xc3f1fdac [usbus2] >> 100035 D - 0xc3f1fd7c [usbus2] >> 100034 D - 0xc3f1fd4c [usbus2] >> 100033 D - 0xc3f1fd1c [usbus2] >> 100032 D - 0xc3f0ddac [usbus1] >> 100031 D - 0xc3f0dd7c [usbus1] >> 100030 D - 0xc3f0dd4c [usbus1] >> 100029 D - 0xc3f0dd1c [usbus1] >> 100028 D - 0xc3ef7dac [usbus0] >> 100027 D - 0xc3ef7d7c [usbus0] >> 100026 D - 0xc3ef7d4c [usbus0] >> 100025 D - 0xc3ef7d1c [usbus0] >> 6 0 0 0 DL crypto_r 0xc0b7446c [crypto returns] >> 5 0 0 0 DL crypto_w 0xc0b74448 [crypto] >> 13 0 0 0 DL - 0xc09f36c4 [yarrow] >> 4 0 0 0 DL - 0xc09f13e4 [g_down] >> 3 0 0 0 DL - 0xc09f13e0 [g_up] >> 2 0 0 0 RL [g_event] >> 12 0 0 0 RL (threaded) [intr] >> 100049 I [irq12: psm0] >> 100048 I [irq1: atkbd0] >> 100046 I [irq15: ata1] >> 100045 I [irq14: ata0] >> 100024 I [irq11: cbb0 bfe0++*] >> 100023 I [irq9: pcm0 iwi0+] >> 100022 I [swi6: Giant taskq] >> 100020 I [swi5: +] >> 100019 I [swi2: cambio] >> 100014 I [swi6: task queue] >> 100006 I [swi3: vm] >> 100005 Run CPU 0 [swi4: clock] >> 100004 I [swi1: netisr 0] >> 11 0 0 0 RL [idle: cpu0] >> 1 0 1 0 SLs wait 0xc3df2d48 [init] >> 10 0 0 0 DL audit_wo 0xc0b747e0 [audit] >> 0 0 0 0 RLs (threaded) [kernel] >> 100050 D deadlkre 0xc09f36c4 [deadlkres] >> 100044 D - 0xc3f5db80 [iwi0 taskq] >> 100042 D - 0xc3f5e8c0 [fw0_taskq] >> 100021 D - 0xc3ea08c0 [thread taskq] >> 100018 D - 0xc3ea0a80 [kqueue taskq] >> 100017 D - 0xc3ea0c40 [acpi_task_2] >> 100016 D - 0xc3ea0c40 [acpi_task_1] >> 100015 D - 0xc3ea0c40 [acpi_task_0] >> 100010 D - 0xc3dda9c0 [firmware taskq] >> 100000 RunQ [swapper] >> 1999 1976 1976 0 Z+ ssh-agent >> 1988 1986 1986 1000 Z ssh-agent >> db> show alllocks >> Process 13467 (mkdir) thread 0xc543b270 (100150) >> exclusive lockmgr bufwait (bufwait) r = 0 (0xebd1f8ec) locked @ >> /usr/src/sys/kern/vfs_bio.c:2581 >> exclusive lockmgr ufs (ufs) r = 0 (0xc542fc08) locked @ >> /usr/src/sys/kern/vfs_subr.c:2091 >> exclusive lockmgr bufwait (bufwait) r = 0 (0xebd15194) locked @ >> /usr/src/sys/ufs/ffs/ffs_softdep.c:11363 >> exclusive lockmgr ufs (ufs) r = 0 (0xc54279e8) locked @ >> /usr/src/sys/kern/vfs_lookup.c:502 >> Process 1986 (sshd) thread 0xc4d2ec30 (100113) >> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xc4f20560) locked @ >> /usr/src/sys/kern/uipc_sockbuf.c:148 >> Process 22 (softdepflush) thread 0xc428f750 (100058) >> exclusive sleep mutex struct mount mtx (struct mount mtx) r = 0 >> (0xc42a3a20) locked @ /usr/src/sys/kern/vfs_subr.c:341 >> Process 12 (intr) thread 0xc3df44e0 (100005) >> exclusive sleep mutex ttymtx (ttymtx) r = 0 (0xc3e63604) locked @ >> /usr/src/sys/dev/dcons/dcons_os.c:232 >> db> show pcpu >> cpuid = 0 >> dynamic pcpu = 0x64e400 >> curthread = 0xc3df44e0: pid 12 "swi4: clock" >> curpcb = 0xc3bfbd90 >> fpcurthread = none >> idlethread = 0xc3df49c0: pid 11 "idle: cpu0" >> APIC ID = 0 >> currentldt = 0x50 >> spin locks held: >> db> show lockedvnods >> Locked vnodes >> >> 0xc5427990: tag ufs, type VDIR >> usecount 5, writecount 0, refcount 7 mountedhere 0 >> flags () >> v_object 0xc54224c8 ref 0 pages 0 >> lock type ufs: EXCL by thread 0xc543b270 (pid 13467) >> ino 11375831, on dev ad0s4f.eli >> >> 0xc542fbb0: tag ufs, type VREG >> usecount 2, writecount 0, refcount 5 mountedhere 0 >> flags () >> v_object 0xc5422880 ref 0 pages 7 >> lock type ufs: EXCL by thread 0xc543b270 (pid 13467) >> ino 11375884, on dev ad0s4f.eli >> db> alltrace >> >> Tracing command mkdir pid 13467 tid 100150 td 0xc543b270 >> sched_switch(c543b270,0,104,191,f614b28d,...) at sched_switch+0x36a >> mi_switch(104,0,c093a853,1eb,4c,...) at mi_switch+0x200 >> sleepq_switch(c543b270,0,c093a853,260,0,...) at sleepq_switch+0x15f >> sleepq_wait(ebd1f88c,4c,c0941795,0,0,...) at sleepq_wait+0x63 >> _sleep(ebd1f88c,c3da14ac,4c,c0941795,0,...) at _sleep+0x365 >> bwait(ebd1f88c,4c,c0941795,ebd1f88c,fb5d5584,...) at bwait+0x6f >> bufwait(ebd1f88c,ebd1f88c,0,ebd1f88c,c54165e4,...) at bufwait+0x48 >> bufwrite(ebd1f88c,0,c09625e3,74d,0) at bufwrite+0x165 >> ffs_bufwrite(ebd1f88c,c542d500,100,4000,0,...) at ffs_bufwrite+0x290 >> ffs_update(c542fbb0,1,80000,fb5d567c,1,...) at ffs_update+0x27c >> softdep_sync_metadata(c5427990,0,c0962b76,144,0,...) at >> softdep_sync_metadata+0xc9f >> ffs_syncvnode(c5427990,1,c543b270,fb5d573c,246,...) at ffs_syncvnode+0x3e2 >> ffs_truncate(c5427990,400,0,880,c4c4f680,...) at ffs_truncate+0x862 >> ufs_direnter(c5427990,c542f660,fb5d5a1c,fb5d5c00,ebd2a930,...) at >> ufs_direnter+0x8d4 >> ufs_mkdir(fb5d5c28,fb5d5c3c,0,fb5d5bd4,fb5d5b6c,...) at ufs_mkdir+0x81f >> VOP_MKDIR_APV(c09bd840,fb5d5c28,68,fb5d5bc0,0,...) at VOP_MKDIR_APV+0xa5 >> kern_mkdirat(c543b270,ffffff9c,bfbfeaa2,0,1ff,...) at kern_mkdirat+0x211 >> kern_mkdir(c543b270,bfbfeaa2,0,1ff,fb5d5d2c,...) at kern_mkdir+0x2e >> mkdir(c543b270,fb5d5cf8,c,c,c5434aa0,...) at mkdir+0x29 >> syscall(fb5d5d38) at syscall+0x220 >> Xint0x80_syscall() at Xint0x80_syscall+0x20 >> --- syscall (136, FreeBSD ELF32, mkdir), eip = 0x28178ef3, esp = >> 0xbfbfe7ec, ebp = 0xbfbfe8b8 --- >> >> Tracing command sh pid 13444 tid 100147 td 0xc4fff9c0 >> sched_switch(c4fff9c0,0,104,191,f5d80613,...) at sched_switch+0x36a >> mi_switch(104,0,c093a853,1eb,5c,...) at mi_switch+0x200 >> sleepq_switch(c4fff9c0,0,c093a853,1a0,5c,...) at sleepq_switch+0x15f >> sleepq_catch_signals(c093a853,160,0,100,100,...) at >> sleepq_catch_signals+0xb7 >> sleepq_wait_sig(c54352a8,5c,c093d112,100,0,...) at sleepq_wait_sig+0x17 >> _sleep(c54352a8,c5435330,15c,c093d112,0,...) at _sleep+0x34e >> kern_wait(c4fff9c0,ffffffff,fb5ccc74,2,0,...) at kern_wait+0xb76 >> wait4(c4fff9c0,fb5cccf8,c0972944,c093d0f1,c54352a8,...) at wait4+0x3d >> syscall(fb5ccd38) at syscall+0x220 >> Xint0x80_syscall() at Xint0x80_syscall+0x20 >> --- syscall (7, FreeBSD ELF32, wait4), eip = 0x281681eb, esp = 0xbfbfe5bc, >> ebp = 0xbfbfe5d8 --- >> >> >> <snip> >> I hope you don't really need all of the traces, I'll keep the system at >> the DDB prompt for now, so I can do other stuff. > > I really need any trace that is involved in vfs. Is this machine hooked to > a serial console for ddb so you can pipe output to a file with script(1)? I > understand the paths that hold the locks but I don't understand who is > deadlocked against it. It looks like the thread that is holding the locks > is just waiting on io to complete. So something that completes IO is hung. May you please show the 'sysctl vm' printout? Attilio -- Peace can only be achieved by understanding - A. EinsteinReceived on Sat May 08 2010 - 23:33:16 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:03 UTC