Re: syscons(4) races

From: Maksim Yevmenkin <emax_at_freebsd.org>
Date: Mon, 24 Nov 2008 11:23:47 -0800
Bruce,

thanks for the clarifications. please read my comments/answers inline

>> ok, in this case i have a somewhat stupid question. when kbdmux(4) is
>> the default keyboard, all those kbdd_xxx() calls (that sccngetch()
>> makes)  will call into kbdmux(4) functions that will grab giant.
>> kbdmux was changed about 2 months ago to do that. it sounds like those
>> changes are completely wrong. am i correct here?
>
> Yes, the low-level console functions are supposed to be reentrant (more
> details below), so they cannot do things like that.

all right. i will need to back those changes out then.

>>> BTW, sccngetch() shouldn't exist.  It existed to demultiplex the 2
>>> console driver entry points sc_cngetch() and sc_cncheckc(), but
>>> sc_cncheckc() has gone away.
>>
>> the entry point in consdev struct is still there. also cncheckc() is
>> trying to call it (if its present). so it looks like it may not be
>> completely gone, just sysconst(4) no longer implements it.
>
> Ah, this is compatibility cruft which is bogusly only used by a driver
> that is newer than the cruft (xen).  Normal drivers use CONSOLE_DRIVER()
> but xen uses the old interface CONS_DRIVER().  CONSOLE_DRIVER() differs
> from CONS_DRIVER() only in not taking the cn_checkc() entry point.
> Binary compatibility and correct naming were broken by renaming
> cn_checkc() to cn_getc().  Names are still correct at the top level
> (there, cncheckc() gives the non-blocking interface and cngetc() gives
> the blocking interface, while in normal drivers there is now only
> cn_getc() which gives the non-blocking interface).
>
> It is cn_dbctl() that has completely gone away.

ok. thanks for the explanation.

>> There is still a problem for calls to sc_cngetc() from the low-console
>>> driver.  These can race with sckbdevent().  In debugger mode, no locking
>>> is permitted, so syscons' getc functions must be carefully written to
>>> do only harmless things when they lose races.  They are not carefully
>>> written (e.g., almost the first thing they do, shown in the above, may
>>> give a buffer overrun:
>>>
>>>   if (fkeycp < fkey.len) {
>>>       mtx_unlock(&Giant);
>>>       splx(s);
>>>       return fkey.str[fkeycp++];
>>>   }
>>
>> all right, so we can not use locks, i'm guessing this includes spin
>> locks too. can we use atomic operations here? since, in polling mode,
>> consumer is going to call getc() in a loop, can we use atomic
>> reference counter to make sure there is only one caller running at a
>> time? if someone already grabbed reference counter just return -1 as
>> if there is no input?
>
> It is safer to use local spinlocks and safer still to use home-made
> locks based on atomic_cmpset(), but still wrong to do simply that since
> it can cause deadlock, especially for debugging -- it is not possible to
> make sure that there is only 1 caller running, since debugging (or NMI,
> or perhaps an ordinary interrupt that cannot reasonably be masked) can
> preempt any caller.  sio uses its local spinlock (I don't like this)
> together with a hack to allow reentering from the debugger only:
>
> %       need_unlock = 0;
> %       if (!kdb_active && sio_inited == 2 && !mtx_owned(&sio_lock)) {
> %               mtx_lock_spin(&sio_lock);
> %               need_unlock = 1;
> %       }
>
> This is only done for cn_putc().  cn_getc() (really cn_checkc()) is
> completely missing locking (except in debugger mode) and needs
> higher-level locking anyway, since the whole polling loop in cngetc()
> needs to be locked to prevent interference.

all right, so i'm guessing chgetc() or rather cncheckc() in
kern_cons.c (or perhaps even gets() in libkern/) should be locked
using some sort of spinlock. cnputs() seems to be using cnputs_mtx.
maybe we need cngets() or something like this?

one thing that i do not really like about sccngetch() in syscons is
that its constantly trying to switch keyboard mode and keyboard
polling. i was wondering if it would be possible to clearly separate
sckbdevent() path from debugger/etc. the former would be interrupt
driven and the later would be poll driven.

> Here sio_lock is the local spinlock (actually per-driver so it isn't very
> local), sio_inited is a flag to prevent use of sio_lock here before
> sio_lock is initialized (this flag requires delicate initialization
> using atomic_cmpset to avoid races), and !kdb_active is the hack to
> let the debugger in irrespective of the state of the lock and without
> touching the lock.

i see.

> Any lock used for the console must be MTX_QUIET and/or MTX_NOWITNESS
> to prevent console i/o generating endless i/o about itself.  I think
> it must be a spinlock too.  The lock thus reduces to a simple spinlock
> which can be implemented directly using atomic_cmpset(), thus making it
> clear that the lock has no interactions with other parts of the system.

right

here is some straw-man patch that Eygene Ryabinkin and myself were
toying with. obviously its far from complete, but we'd like to get
some feedback to see if we are going in the right direction.

thanks,
max

Received on Mon Nov 24 2008 - 18:23:49 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:38 UTC