John Baldwin wrote: > I figured out a deadlock I got on a 4 CPU testbox last night and I think I > have tracked it down to a bug in 4BSD that can be worked around by turning > off idle halt via the machdep.cpu_idle_hlt sysctl. Specifically, it involves > waking up halted CPUs when a thread it is pinned or bound to a specific CPU > is made runnable. The details of what I saw today are below: did you try setting the sysctl that makes the idle IPI get sent to ALL idle cpus and not just the one that is selected? > > FreeBSD/i386 (deimos.baldwin.cx) (ttyd0) > > login: KDB: enter: Line break on console > [thread pid 13 tid 100006 ] > Stopped at kdb_enter+0x30: leave > db> show pcpu > cpuid = 3 > curthread = 0xc1dfd5c0: pid 11 "idle: cpu3" > curpcb = 0xdc608da0 > fpcurthread = none > idlethread = 0xc1dfd5c0: pid 11 "idle: cpu3" > APIC ID = 3 > currentldt = 0x28 > db> ps > pid proc uid ppid pgrp flag stat wmesg wchan cmd > 79079 c2ebe1fc 1001 79067 610 0000012 [LOCK vm page queue mutex c1f90c40] make > 79078 c2f2ede4 1001 78894 610 0000002 [LOCK vm page queue mutex c1f90c40] sh > 79077 c3498000 1001 79043 610 0004002 [LOCK vm page queue mutex c1f90c40] make > 79075 c2f317f0 1001 78898 610 0004002 [LOCK vm page queue mutex c1f90c40] make > 79074 c2184de4 1001 78902 610 0004002 [LOCK vm page queue mutex c1f90c40] make > 79067 c23c71fc 1001 78955 610 0004002 [SLPQ ppwait 0xc23c71fc][SLP] make > 79060 c23e35f4 1001 78915 610 0004002 [LOCK Giant c1e41800] make > 79043 c2a50000 1001 79042 610 0004002 [SLPQ wait 0xc2a50000][SLP] sh > 79042 c2f493f8 1001 78884 610 0004002 [SLPQ select 0xc075e024][SLP] make > 78978 c28de7f0 1001 78975 610 0004002 [RUNQ] sh > 78975 c29bf1fc 1001 78970 610 0004002 [SLPQ select 0xc075e024][SLP] make > ... > 40 c1e67000 0 0 0 0000204 [IWAIT] swi1: net > 39 c1e671fc 0 0 0 0000204 [IWAIT] swi3: vm > 38 c1e4b5f4 0 0 0 000020c [LOCK Giant c1e41800] swi4: clock sio > --More-- > 37 c1e4b7f0 0 0 0 0000204 [IWAIT] irq13: > 36 c1e4b9ec 0 0 0 0000204 [IWAIT] irq0: clk > 35 c1e4bbe8 0 0 0 0000204 [IWAIT] irq23: ahc0 ahc1 > 34 c1e4bde4 0 0 0 0000204 [IWAIT] irq22: > 33 c1e5b000 0 0 0 0000204 [IWAIT] irq21: > 32 c1e5b1fc 0 0 0 0000204 [IWAIT] irq20: > 31 c1e5b3f8 0 0 0 0000204 [LOCK Giant c1e41800] irq19: fxp0 uhci0 > ... > 14 c1dfc5f4 0 0 0 000020c [Can run] idle: cpu0 > 13 c1dfc7f0 0 0 0 000020c [CPU 1] idle: cpu1 > 12 c1dfc9ec 0 0 0 000020c [CPU 2] idle: cpu2 > 11 c1dfcbe8 0 0 0 000020c [CPU 3] idle: cpu3 > > Thus, CPU 0 is running a make process while CPUs 1, 2, and 3 are all idle. > Note that softclock() and irq19 (me trying to ssh in this morning over fxp0) > are both blocked on Giant. Also note that there are 5 threads blocked on the > vm page queue mutex. First, let's see who owns Giant and what they are doing. > I was doing a benchmark, so I didn't have witness compiled in or turned on, so > 'show alllocks' wasn't an option. > > db> x/x Giant,10 > Giant: c0730028 c06ffb2c c06ffb2c b0000 0 > Giant+0x14: 0 0 c2ebd8a2 0 > Giant+0x24: 0 c07300a0 0 0 > time_monitor: 92c86 7 420c903b > > The owner is 0xc2ebd8a2 but minus 0x2 since that is a flag: > > db> show threads > ... > 100286 (0xc2ebd8a0) sched_switch(c2ebd8a0,0,1,77,fcb6cb7e) at sched_switch+0x170 > ... > 100189 (0xc28dfcf0) sched_switch(c28dfcf0,0,2,dff58a7c,95a67b30) at sched_switch+0x170 > ... > db> tr 100286 > Tracing pid 79075 tid 100286 td 0xc2ebd8a0 > sched_switch(c2ebd8a0,0,1,77,fcb6cb7e) at sched_switch+0x170 > mi_switch(1,0,c07026d2,254,c075d568) at mi_switch+0x1d9 > turnstile_wait(c076bda0,c28dfcf0,c2ebd8a0,d17bc930,c21a79e8) at turnstile_wait+0x269 > _mtx_lock_sleep(c076bda0,c2ebd8a0,0,c0704083,dc5) at _mtx_lock_sleep+0x101 > _mtx_lock_flags(c076bda0,0,c0704083,dc5,c0704083) at _mtx_lock_flags+0x45 > vfs_clean_pages(d17bc930,0,c0704083,423,c21a79b4) at vfs_clean_pages+0x70 > bdwrite(d17bc930,c3bdfa00,80,2000,0) at bdwrite+0x2f0 > ffs_update(c46bf564,0,1,50,e00efb2c) at ffs_update+0x396 > ufs_inactive(e00efb44,e00efb5c,c05978ea,e00efb44,0) at ufs_inactive+0x225 > VOP_INACTIVE_AP(e00efb44,0,c0705028,768,c074e7a0) at VOP_INACTIVE_AP+0x4e > vrele(c46bf564,c074e2a0,c46bf564,1,c255a700) at vrele+0x11a > vn_close(c46bf564,1,c255a700,c2ebd8a0,0) at vn_close+0x50 > vn_closefile(c30e6318,c2ebd8a0,c06fddb9,846,c30e6318) at vn_closefile+0xf4 > fdrop_locked(c30e6318,c2ebd8a0,c06fddb9,831) at fdrop_locked+0x94 > fdrop(c30e6318,c2ebd8a0,c06fddb9,77c,c0515cfd,c0737ce0,1000,c2ebd8a0,e00efcac,c0 > 56009c,1,e00efc80,c255a700,0,e00efc90,e00efc90,c0513783,3f,c31b93c8,0,6,c31b9300,e00efcb0,c051389d,c31b9300,0,6) at fdrop+0x3c > closef(c30e6318,c2ebd8a0,c06fddb9,3e8,c2ebd8a0) at closef+0x3f2 > close(c2ebd8a0,e00efd14,4,e00efd00,1) at close+0x1f2 > syscall(2f,2f,2f,809c200,0) at syscall+0x2a0 > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (6, FreeBSD ELF32, close), eip = 0x805c743, esp = 0xbfbfd18c, ebp = 0xbfbfd1a8 --- > > Lookup the pid in the ps output above, I see that this thread is one of the > threads blocked on the vm page queue mutex. Ok, so let's look at that lock's > owner: > > db> x/x 0xc076bda0,10 > vm_page_queue_mtx: c0730028 c07100af c07100af b0000 > vm_page_queue_mtx+0x10: 0 0 0 c28dfcf2 > vm_page_queue_mtx+0x20: 0 0 c1e673f8 f540 > vm_pages_needed: 0 0 0 1552 > > Lock owner is 0xc28dfcf2. Looked that up via show threads and got: > > db> tr 100189 > Tracing pid 78978 tid 100189 td 0xc28dfcf0 > sched_switch(c28dfcf0,0,2,dff58a7c,95a67b30) at sched_switch+0x170 > mi_switch(2,0,c0701343,252,0) at mi_switch+0x1d9 > critical_exit(c074bcc0,c052ddd5,c0755f18,0,c28dfcf0) at critical_exit+0x7e > intr_execute_handlers(c074bcc0,dff58b10,c076bc00,dff58b1c,c28dfcf0) at intr_execute_handlers+0xa4 > atpic_handle_intr(0) at atpic_handle_intr+0xaf > Xatpic_intr0() at Xatpic_intr0+0x20 > --- interrupt, eip = 0xc054dd3c, esp = 0xdff58b54, ebp = 0xdff58b68 --- > critical_exit(dff58b94,c06bbcbd,c075cae0,0,c0715535) at critical_exit+0xcc > _mtx_unlock_spin_flags(c075cae0,0,c0715535,252,e) at _mtx_unlock_spin_flags+0x23 > pmap_invalidate_page(c29c869c,8064000,2,766,c19d6d10) at pmap_invalidate_page+0xdd > pmap_enter(c29c869c,8064000,c19d6d10,3,0) at pmap_enter+0x33d > vm_fault(c29c85dc,8064000,2,8,c28dfcf0) at vm_fault+0x13e7 > trap_pfault(dff58d48,1,80643e8,bfbfe060,80643e8) at trap_pfault+0xf1 > trap(2f,2f,2f,0,8069000) at trap+0x21a > calltrap() at calltrap+0x5 > --- trap 0xc, eip = 0x80563cf, esp = 0xbfbfe0d0, ebp = 0xbfbfe0d8 --- > > So this is the thread that is in state RUNQ even though I have three idle > CPUs. Very curious! I then wondered if my current CPU (CPU 3) thought > that there were any runnable threads: > > db> call sched_runnable() > 0 > > 0 means no. That's when I figured that the runnable thread must be pinned, > especially since it was interrupted while messing with the pmap which > usually requires pinning. I then tried to do an experiment to test my theory > by turning off the idle halt and manually IPI'ing all the other CPUs so that > they would recheck their run queues and whichever CPU needed to run the > process owning the vm page queue mutex would run. Unfortunately, I think I > got the vector number wrong as all 3 CPUs paniced. :-P > > db> x cpu_idle_hlt > cpu_idle_hlt: 1 > db> w cpu_idle hlt 0 > cpu_idle_hlt 0x1 = 0 > db> call lapic_ipi_raw(0x000c00e7, 0) > 0xd1622000 > db> c > cccpppuuuiiiddd === 102;;; aaapppiiiccc iiiddd === 000102 > > > iiinnnssstttrrruuuccctttiiiooonnn pppoooiiinnnttteerer r = > = = 0 x008xx:80:8x0:cx00cx60ca50536bae055eb9e > a5st > as > ctkas ctpkao cipkno tipenorti en rt e r = 0 x= 1 0 :0=0x > x10d0xc:16000xed:cf0bex40dfc > afb6r40abm > ... > 857 (make) > trap number = 30 > panic: unknown/reserved trap > cpuid = 0 > Uptime: 1d16h58m16s > Cannot dump. No dump device defined. > Automatic reboot in 15 seconds - press a key on the console to abort > Rebooting... > cpu_reset: Stopping other CPUs > > Thus, my theory is that when the pinned thread was preempted and put back on > the run queue, the scheduler didn't IPI the CPU it was pinned to to wake it > up in case it was idle. The IPI is only needed if the CPUs are halted, which > is why I think turning the idle halt off might work as a workaround. I don't > know if ULE has this same issue, but I've cc'd Jeff and hopefully he can look > into it. >Received on Mon Feb 14 2005 - 00:28:39 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC