On Tue, 29 Jun 2004, Colin Percival wrote: > At 08:39 29/06/2004, Daniel Lang wrote: > >login: lock order reversal > > 1st 0xc070a0c0 sched lock (sched lock) _at_ /usr/src/sys/kern/kern_proc.c:672 > > 2nd 0xc0745d40 sio (sio) _at_ /usr/src/sys/dev/sio/sio.c:3205 > >Stack backtrace: > >backtrace(ffffffff,c0712948,c0712b00,c06e25c4,c07358dc) at 0xc051e066 = backtra2 > >witness_checkorder(c0745d40,9,c06b1e34,c85,3f8) at 0xc05393c4 = witness_checkor4 > >_mtx_lock_spin_flags(c0745d40,0,c06b1e34,c85,c0712498) at 0xc0516e9e = _mtx_loce > >siocnputc(c06f9c40,63) at 0xc064375f = siocnputc+0x6b > >cnputc(63) at 0xc05483ed = cnputc+0x4d > >putchar(63,e5bd87e0) at 0xc053387a = putchar+0x92 > >kvprintf(c069d236,c05337e8,e5bd87e0,a,e5bd8800) at 0xc0533a87 = kvprintf+0x77 > >printf(c069d236,32cf1,0,32ce6,0,dd8,c9115828) at 0xc0533763 = printf+0x43 > >calcru(c91156e0,e5bd8ad0,e5bd8ad8,0,e5bd8a10) at 0xc051cb9e = calcru+0x1f2 > >[...] > > Regardless of the questionable validity of the other problems this machine > is encountering, The bug in calcru() was supposed to be fixed. Anyway, printf() must work from anywhere (including nested in itself). > this LOR is real (and ugly). calcru() asserts that it is > holding sched_lock, but it can printf() about negative runtime or runtime > going backwards. With a serial console attached, these messages end up at > siocnputc(), which needs sio_lock. > > Now, I presume that there is some reason why the locking order hard-coded > into witness makes it illegal to pick up sio_lock while holding sched_lock, > but I can't see it. Maybe someone else can explain? This is mianly just a bug in the witness order. Console output must work at any time, including when the "any" lock is held, so any locks used by console output routines must be legal to pick up while holding the "any" lock. I had 2 different fixes for this and one related one queued: (1) Witnessing of sio_lock can be broken by turning it off. Note that non-quietness of witnessing of sio lock is already turned off. %%% Index: sio.c =================================================================== RCS file: /home/ncvs/src/sys/dev/sio/sio.c,v retrieving revision 1.442 diff -u -2 -r1.442 sio.c --- sio.c 25 Jun 2004 10:51:33 -0000 1.442 +++ sio.c 26 Jun 2004 23:11:13 -0000 _at__at_ -526,7 +644,7 _at__at_ while (sio_inited != 2) if (atomic_cmpset_int(&sio_inited, 0, 1)) { + /* XXX might need MTX_QUIET instead of MTX_NOWITNESS.*/ mtx_init(&sio_lock, sio_driver_name, NULL, - (comconsole != -1) ? - MTX_SPIN | MTX_QUIET : MTX_SPIN); + MTX_SPIN | MTX_NOWITNESS); atomic_store_rel_int(&sio_inited, 2); } %%% The patch also fixes some style bugs (non-non-KNF continuation indents). It uses MTX_NOWITNESS unconditionally to avoid related problems when the port is a debugger port. (2) My version doesn't actually use sio_lock. It disables interrupts for the !SMP case and also uses home made spin locking based on atomic_cmpset() in the SMP case. Neither of these is witnessed, so the problem goes away in the same way as for MTX_NOWITNESS. (3) It is an error to acquire a lock while ddb is active, but sio console output acquires sio_lock (if it is not already held) for ddb output. The lock should only be acquired if !db_active. I'm surprised that this LOR isn't more common. Console output is done with sched_lock any time code that holds sched_lock is debugged. I normally wouldn't see the problem since I don't use WITNESS much. BruceReceived on Thu Jul 01 2004 - 01:55:03 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:59 UTC