Re: syscons(4) races

From: Maksim Yevmenkin <emax_at_freebsd.org>
Date: Thu, 20 Nov 2008 16:25:40 -0800
[moving to -current]

Bruce,

ok, i'm looking at it again...

[...]

>>  More locking for syscons(4). This should prevent races with sckbdevent().
>
> This cannot be the correct fix.  It should cause panics due to failure
> of the (missing) assertion that no locks may be accessed in debugger mode.
> Locks may not be accessed in debugger mode because:
> (1) locks may be in an inconsistent state (when locking code is being
>     debugged) (unless they are made to work virtually -- switch all locks
>     to a safe state while in debugger mode, but display the unswitched
>     state in all debugger commands)
> (2) deadlock is possible (would also be avoided by virtualization).

ok, fair enough.

>> --- head/sys/dev/syscons/syscons.c      Sun Nov 16 21:57:54 2008
>>  (r185012)
>> +++ head/sys/dev/syscons/syscons.c      Sun Nov 16 22:39:04 2008
>>  (r185013)
>> _at__at_ -1572,6 +1572,7 _at__at_ sccngetch(int flags)
>>    int s = spltty();   /* block sckbdevent and scrn_timer while we poll */
>
> The comment still says that this is what blocks sckbdevent().
>
> The comment was wrong too for debugger mode -- in debugger mode,
> interrupts are masked in hardware and other CPUs are stopped, so nothing
> except possibly this functions internals can call sckbdevent().

ok

> Internal calls to scrn_timer() used to be prevented by sccndbctl()
> setting syscons' `debugger' flag and this function and others checking
> this flag.  This has been lost.  I think the flag isn't needed and
> wasn't used to protect sckbdevent(), and your problem has nothing to
> do with debugger mode except for breaking it.  Instead your problem
> is a layering one (continued below (*)).

ok

>>    int c;
>>
>> +    mtx_lock(&Giant);
>>    /* assert(sc_console != NULL) */
>>
>>    /*
>
> Acquiring Giant in debugger mode is especially invalid (except deadlock
> is not so likely for a recursive lock).

ok, in this case i have a somewhat stupid question. when kbdmux(4) is
the default keyboard, all those kbdd_xxx() calls (that sccngetch()
makes)  will call into kbdmux(4) functions that will grab giant.
kbdmux was changed about 2 months ago to do that. it sounds like those
changes are completely wrong. am i correct here?

> BTW, sccngetch() shouldn't exist.  It existed to demultiplex the 2
> console driver entry points sc_cngetch() and sc_cncheckc(), but
> sc_cncheckc() has gone away.

the entry point in consdev struct is still there. also cncheckc() is
trying to call it (if its present). so it looks like it may not be
completely gone, just sysconst(4) no longer implements it.

> (*) syscons has several getc routines.  The layering and locking for
> these should be something like:
>
>     sc_cngetc(): must not access any locks; currently implemented by
>        calling scgetc(); therefore:
>     scgetc(): must not access any locks

ok.

>     sccngetch(): bogus layering -- see above
>     sckbdevent(): I think this is the entry point for all normal (non-
>        low-level-console) syscons input.  It is currently implemented
>        by calling scgetc(), which must not access any locks.  Therefore,
>        any locking must be in this function.  I think it currently uses
>        Giant locking.

yes, it does.

> There is still a problem for calls to sc_cngetc() from the low-console
> driver.  These can race with sckbdevent().  In debugger mode, no locking
> is permitted, so syscons' getc functions must be carefully written to
> do only harmless things when they lose races.  They are not carefully
> written (e.g., almost the first thing they do, shown in the above, may
> give a buffer overrun:
>
>    if (fkeycp < fkey.len) {
>        mtx_unlock(&Giant);
>        splx(s);
>        return fkey.str[fkeycp++];
>    }

all right, so we can not use locks, i'm guessing this includes spin
locks too. can we use atomic operations here? since, in polling mode,
consumer is going to call getc() in a loop, can we use atomic
reference counter to make sure there is only one caller running at a
time? if someone already grabbed reference counter just return -1 as
if there is no input?

> since the increment is not atomic with the bounds check), but they
> mostly work.  (Syscons' putc routines have a much larger number of
> races like this, but they mostly work too.  It used to be easy to cause
> panics by racing normal console output with printfs from an interrupt
> handler (use a high frequency timeout interrupt handler that prints
> something), but almost any locking would fix that and I think Giant-locking
> everything fixed it accidentally (**).  So there is only a problem
> modes like debugger mode where locking is not permitted.)
>
> The races for sc_cngetc() were normally limited:
> - most calls were in debugger mode, and debugger mode limits problems
> - the only other calls were for things like gets() for non-auto
>   mountroot and cngetc() for the "hit any key to reboot".  This case
>   might even supply the needed and permitted Giant locking accidentally.
>
> However, you seem to have made the races very common by (ab)using cngetc()
> for keyboard multiplexing.  cngetc() is not designed for this.  You need
> to know syscons' internals and do the necessary Giant locking in the caller.

this is what i'm not getting. are you saying that lower layer keyboard
drivers can not use any locking whatsoever in polling mode (or rather
debug mode)? are you saying that we need a completely different path
for handling keyboard input in debug mode? and/or possibly polling
mode?

thanks,
max
Received on Thu Nov 20 2008 - 23:58:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:37 UTC