Livelock with GENERIC HEAD from Feb 19 13:36 UTC

From: Peter Holm <peter_at_holm.cc>
Date: Tue, 22 Feb 2005 16:48:20 +0100
With GENERIC HEAD from Feb 19 13:36 UTC + mpsafe_vfs = 1 I got
a new livelock:

http://www.holm.cc/stress/log/cons118.html

This time I think I have a clue to what the problem is. One of
the stress test programs (swap) works like this pseudo code:

c = malloc(size);
page = getpagesize();
while (done_testing == 0) {
	i = 0;
	while (i < size && done_testing == 0) {
		c[i] = 0;
		i += page;
	}
}

Could it be that two incarnations of this program can monopolize
the run queue?

$ sort -n +4 < /var/crash/ps.186 | grep " R"
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
    0     5     0   0   8  0     0     0 -      RL    ??    0:00.00 [thread tas
    0 68391 68390   8  97  0   320   120 -      RE    ??    0:00.01 [atrun]
 1001 68342 68326 295 131  0 17628     0 -      R+   #C:  192:43.57 [swap]
 1001 68354 68326 295 131  0 13268     0 -      R+   #C:  192:42.13 [swap]
 1001 68331 68325 288 132  0  1224     0 -      R+   #C:    0:00.02 [creat]
 1001 68332 68325 288 132  0  1224     0 -      R+   #C:    0:00.02 [creat]
 1001 68333 68325 288 132  0  1224     0 -      R+   #C:    0:00.02 [creat]
 1001 68334 68325 288 132  0  1224     0 -      R+   #C:    0:00.03 [creat]
 1001 68335 68325 288 132  0  1224     0 -      R+   #C:    0:00.10 [creat]
 1001 68336 68325 288 132  0  1224     0 -      R+   #C:    0:00.07 [creat]
 1001 68361 68328 290 132  0  1232     0 -      R+   #C:    0:00.42 [tcp]
 1001 68362 68329 288 132  0  1252     0 -      R+   #C:    0:00.06 [udp]
 1001 68363 68329 288 132  0  1252     0 -      R+   #C:    0:00.04 [udp]
 1001 68368 68360 288 132  0  1320     0 -      R+   #C:    0:00.05 [tcp]
 1001 68369 68361 290 132  0  1320     0 -      R+   #C:    0:00.56 [tcp]
 1001 68387 68338 288 132  0  1656     0 -      R+   #C:    0:00.02 [sh]
 1001 68388 68340 288 132  0  1664     0 -      R+   #C:    0:00.02 [sh]
 1001 68389 68388 288 132  0     0     0 -      RE+  #C:    0:00.02 [swapinfo]
 1001 68390 68388 288 132  0  1204     0 -      R+   #C:    0:00.01 [tail]
    0    11     0 262 171  0     0     0 -      RL    ??  345:02.29 [idle: cpu0

At a later freeze today a "kill 1 <swap pid>" from kdb unfroze
the box.
-- 
Peter Holm
Received on Tue Feb 22 2005 - 14:48:24 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:28 UTC