truss(1) locked in kernel with 8.0-BETA2

From: Jeremie Le Hen <jeremie_at_le-hen.org>
Date: Sat, 22 Aug 2009 20:58:12 +0200
Hi,

I've upgraded my laptop to 8.0-BETA2 and ran portupgrade in script(1).
But according to top, it seems script(1) is going crazy, even after I've
hit ^C:

% CPU:  0.0% user,  7.9% nice, 92.1% system,  0.0% interrupt,  0.0% idle
% Mem: 64M Active, 675M Inact, 141M Wired, 21M Cache, 111M Buf, 90M Free
% Swap: 1024M Total, 1024M Free
% 
% PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
% 47998 root          1 128   10  3344K   788K RUN      3:02 100.00% script

jarjarbinks:~:7# procstat -k 47998
  PID    TID COMM             TDNAME           KSTACK                       
47998 100087 script           -                mi_switch critical_exit intr_event_handle intr_execute_handlers lapic_handle_intr Xapic_isr1 binuptime bintime microtime gettimeofday syscall Xint0x80_syscall 

truss(1) show the following sequence infinitely:
% gettimeofday({1250965404.830938 },0x0)           = 0 (0x0)
% select(5,{0 4},0x0,0x0,{30.000000 })             = 1 (0x1)
% read(0,0xbfbfdf5c,1024)                          = 0 (0x0)
% write(4,0xbfbfdf5c,0)                            = 0 (0x0)

And when I try to stop truss, it nevers end and seems to be blocked in kernel:

ps aux:
root   49506  0.0  0.1  3284   912   2  I+    8:23PM   0:00.06 truss -p 47998
root   47998  0.0  0.1  3344   788   1  TNX+  8:39AM   6:37.36 /usr/bin/script -qa /tmp/portupgrade20090822-4080-3knxs0-0 env UPGRA

procstat -k:
49506 100121 truss            -                mi_switch sleepq_switch sleepq_catch_signals sleepq_wait_sig _sleep kern_wait wait4 syscall Xint0x80_syscall
47998 100087 script           -                mi_switch thread_suspend_switch ptracestop syscall Xint0x80_syscall 
Note that once I've truss(1)'ed script(1), it stops consuming all the
CPU cycles.  It seems to be deadlocked, but DDB shows only one lock.  So
this is not a deadlock stricly speaking, but obviously truss(1) is
waiting for something that will never come but nonetheless I can kill it
with a SIGKILL.


The same thing can happen with dialog(1); truss shows:
% read(0,0xbfbfcf10,1)                             = 0 (0x0)
% read(0,0xbfbfcf10,1)                             = 0 (0x0)
% read(0,0xbfbfcf10,1)                             = 0 (0x0)
% read(0,0xbfbfcf10,1)                             = 0 (0x0)

And so on.  And after ^C the processes are staled in the same state in
the kernel.

I've reproduced this twice in a row across a reboot.  I think I can
reproduce it on demand.

Any idea woul be welcome.
Thanks.
Regards,
-- 
Jeremie Le Hen
< jeremie at le-hen dot org >< ttz at chchile dot org >
Received on Sat Aug 22 2009 - 17:16:58 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:54 UTC