With GENERIC HEAD from Feb 19 13:36 UTC + mpsafe_vfs = 1 I got a new livelock: http://www.holm.cc/stress/log/cons118.html This time I think I have a clue to what the problem is. One of the stress test programs (swap) works like this pseudo code: c = malloc(size); page = getpagesize(); while (done_testing == 0) { i = 0; while (i < size && done_testing == 0) { c[i] = 0; i += page; } } Could it be that two incarnations of this program can monopolize the run queue? $ sort -n +4 < /var/crash/ps.186 | grep " R" UID PID PPID CPU PRI NI VSZ RSS MWCHAN STAT TT TIME COMMAND 0 5 0 0 8 0 0 0 - RL ?? 0:00.00 [thread tas 0 68391 68390 8 97 0 320 120 - RE ?? 0:00.01 [atrun] 1001 68342 68326 295 131 0 17628 0 - R+ #C: 192:43.57 [swap] 1001 68354 68326 295 131 0 13268 0 - R+ #C: 192:42.13 [swap] 1001 68331 68325 288 132 0 1224 0 - R+ #C: 0:00.02 [creat] 1001 68332 68325 288 132 0 1224 0 - R+ #C: 0:00.02 [creat] 1001 68333 68325 288 132 0 1224 0 - R+ #C: 0:00.02 [creat] 1001 68334 68325 288 132 0 1224 0 - R+ #C: 0:00.03 [creat] 1001 68335 68325 288 132 0 1224 0 - R+ #C: 0:00.10 [creat] 1001 68336 68325 288 132 0 1224 0 - R+ #C: 0:00.07 [creat] 1001 68361 68328 290 132 0 1232 0 - R+ #C: 0:00.42 [tcp] 1001 68362 68329 288 132 0 1252 0 - R+ #C: 0:00.06 [udp] 1001 68363 68329 288 132 0 1252 0 - R+ #C: 0:00.04 [udp] 1001 68368 68360 288 132 0 1320 0 - R+ #C: 0:00.05 [tcp] 1001 68369 68361 290 132 0 1320 0 - R+ #C: 0:00.56 [tcp] 1001 68387 68338 288 132 0 1656 0 - R+ #C: 0:00.02 [sh] 1001 68388 68340 288 132 0 1664 0 - R+ #C: 0:00.02 [sh] 1001 68389 68388 288 132 0 0 0 - RE+ #C: 0:00.02 [swapinfo] 1001 68390 68388 288 132 0 1204 0 - R+ #C: 0:00.01 [tail] 0 11 0 262 171 0 0 0 - RL ?? 345:02.29 [idle: cpu0 At a later freeze today a "kill 1 <swap pid>" from kdb unfroze the box. -- Peter HolmReceived on Tue Feb 22 2005 - 14:48:24 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC