Yes, the subject sounds quite hairy, so please let me try to explain it. First, let's consider one concrete function: static int ukbd_poll(keyboard_t *kbd, int on) { struct ukbd_softc *sc = kbd->kb_data; if (!mtx_owned(&Giant)) { /* XXX cludge */ int retval; mtx_lock(&Giant); retval = ukbd_poll(kbd, on); mtx_unlock(&Giant); return (retval); } if (on) { sc->sc_flags |= UKBD_FLAG_POLLING; sc->sc_poll_thread = curthread; } else { sc->sc_flags &= ~UKBD_FLAG_POLLING; ukbd_start_timer(sc); /* start timer */ } return (0); } This "XXX cludge" [sic] pattern is scattered around a few functions in the ukbd code and perhaps other usb code: func() { if (!mtx_owned(&Giant)) { mtx_lock(&Giant); func(); mtx_unlock(&Giant); return; } // etc ... } I have a few question directly and indirectly related to this pattern. I. [Why] do we need this pattern? Can the code be re-written in a smarter (if not to say proper) way? II. ukbd_poll() in particular can be called from the kdb context (kdb_trap -> db_trap -> db_command_loop -> etc). What would happen if the Giant is held by a thread on some other CPU (which would be hard-stopped by kdp_trap)? Won't we get a lockup here? So shouldn't this code actually check for kdb_active? III. With my stop_scheduler_on_panic patch ukbd_poll() produces infinite chains of 'infinite' recursion because mtx_owned() always returns false. This is because I skip all lock/unlock/etc operations in the post-panic context. I think that it's a good philosophical question: what operations like mtx_owned(), mtx_recursed(), mtx_trylock() 'should' return when we actually act as if no locks exist at all? My first knee-jerk reaction was to change mtx_owned() to return true in this "lock-less" context, but _hypothetically_ there could exist some symmetric code that does something like: func() { if (mtx_owned(&Giant)) { mtx_unlock(&Giant); func(); mtx_lock(&Giant); return; } // etc ... What do you think about this problem? Should we re-write ukbd_poll() (and possibly usb code) or should mutex_owned() be adjusted? That question III actually brings another thought: perhaps the whole of idea of skipping locks in a certain context is not a correct direction? Perhaps, instead we should identify all the code that could be legitimately executed in the after-panic and/or kdb contexts and make that could explicitly aware of its execution context. That is, instead of trying to make _lock, _unlock, _owned, _trylock, etc do the right thing auto-magically, we should try to make the calling code check panicstr and kdb_active and do the right thing on that level (which would include not invoking those lock-related operations or other inappropriate operations). Thank you very much in advance for your insights and help! -- Andriy GaponReceived on Tue Aug 23 2011 - 10:09:19 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:17 UTC