Re: FW: Call for comments: CoxR, a CVS/mail-lists/BTS

From: Robert Watson <rwatson_at_FreeBSD.org>
Date: Mon, 7 Feb 2005 15:21:57 +0000 (GMT)
On Sun, 6 Feb 2005, ALeine wrote:

> Oh come, FreeBSD 5.x does have a mutex hell going on, but to say it has
> so many bugs as to require a truck is absurd. :-> A smaller lorry
> perhaps, but a truck - definitely not. :-) It might also be a good idea
> to use an automated spell-check on your pages, I've noticed a number of
> typos such as "divelopers" and similar. 

I appreciate that not everyone is a fan of mutex synchronization, but
"mutex hell" is a bit of an odd description: most bugs I see getting
reported (and fixed) aren't even locking-related.  They're generally a
property of lack of testing exposure for more obscure features or edge
cases that are hard to test for without a wide testing base, such as
edge-case hardware, bugs associated with longer run times, or a recently
introduced feature, etc. Generally speaking, in the last week, I saw a
couple of classes of bug fix fly by in commits, in order of frequency of
occurence: 

- Minor device driver bugs involving alignment, feature mapping for device
  IDs, attach/detach bugs, error handling, etc.  In one case, the bug was
  that a device driver was able to run MPSAFE, but the flag was set
  incorrectly to not let it.  As usual, a moderate amount of change in
  ACPI.  This was the vast majority of bug fixes.

- Network stack logical errors or C-related errors: generally, doing
  something wrong with mbufs or routing.  Mostly "syntax" and not
  "semantics", although a couple of netflow bugs that were more serious
  and the result of more broad exposure since its commit (last month?).

- Scheduling related bugs in ULE -- Jeff MFC'd a number of fixes to
  RELENG_5 for the first time in several months, so there was some
  backlog, but I think it's not unusual to see a trickle of scheduling
  related changes, so isn't entirely unrepresentative.

- VFS/file system bugs -- a couple were locking related as a result of
  Jeff's on-going work to get Giant off of the file system code, but more
  were associated with on-going buffer cache work by Poul-Henning.

While I haven't made any attempt to determine if the last week is
"typical" of long term bug fixes, it was easily on-hand, and the results
are suggestive.  Locking, as with other complex changes in the OS, comes
with bugs, but it's hardly "hell" :-).  One of the nice things about the
locking approach to synchronization is that it comes with a strong
assertion model: this means you can often find bugs without actually
triggering the symptoms of the bugs, which may be difficult to trigger or
very sensitive to timing.  So when there are locking bug fixes, there more
often found through a WITNESS warning than an exercised bug.  When I do
complex application pthreads programming, I often wish it had the
threading/locking debugging facilities the FreeBSD kernel has :-).

Robert N M Watson
Received on Mon Feb 07 2005 - 14:22:57 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:27 UTC