On 11/9/11, Arnaud Lacombe <lacombar_at_gmail.com> wrote: > Hi, > > On Tue, Nov 8, 2011 at 9:35 PM, Julian Elischer <julian_at_freebsd.org> wrote: >> On 11/8/11 5:52 PM, Arnaud Lacombe wrote: >>> >>> Hi, >>> >>> On Tue, Nov 8, 2011 at 7:08 PM, Julian Elischer<julian_at_freebsd.org> >>> wrote: >>>> >>>> On 11/8/11 10:49 AM, Arnaud Lacombe wrote: >>>>> >>>>> Hi, >>>>> To avoid future complaints about the fact that I would be only "talk" >>>>> without "action", I did implement what I suggested above. As it is >>>>> quite a large patch-set, I will not post it directly here, however, it >>>>> is available on github: >>>>> >>>>> https://github.com/lacombar/freebsd/tree/master/topic/kern-lock-debug >>>>> >>>>> It convert a bunch of debug interface to use the caller instruction >>>>> pointer, as well as a proof-of-concept teaching printf(9) to convert >>>>> IP to symbol_name+offset. >>>>> >>>>> It translates in a direct saving of about +250kB on i386's GENERIC, >>>>> just in kernel text size. Even the worst case, ie LOCK_DEBUG == 0, >>>>> translates to a save of +80kB. >>>>> >>>>> Please note that this is still WIP code. >>>> >>>> A couple of comments. >>>> Firstly, the idea of a printf method to print the IP as symbol+offset is >>>> an >>>> interesting idea >>>> that should be followed up in its own right. >>>> >>> FWIW, I have no credit in this idea. It has been in Linux for ages and >>> ages. >> >> yeah as I said at work I use linux and BSD... >> the linux stuff that just prints out IP really annoys me. >> >> the list stuff and netgraph debug (which should be off in any production >> system) > this is, I guess, where we do not agree. You find it acceptable to run > totally different code in production and during debug. I do not. This > is completely insane, even more nowadays where heavy parallelism > increases the likelihood of races, and subtle change in the code, even > optimization, can cause total behavioral change (ie. Heisenbug). > > For the record, we have been tracking for more than 2 months (first > occurrences happened a year ago) an mbuf corruption in the network > stack, present in all released code since at least FreeBSD 7[0]. Each > time we think it is fixed, we are proven wrong by customers within a > few days when the system crashes again. Even the last attempt which > was believed to be bullet-proof failed and crashes daily. > > All that to say that production code should embed enough facilities to > allow the system to be fully debugged with a runtime cost as low as > possible, and a code-size cost as low as possible[1]. I should be able > to connect on a production machine, turn a knob, an see what is going > wrong, without the customer noticing. > > In the worst case, when you have to enable debug-only code, it must > not be done by making the non-debug case more expensive, but wrap > around. The whole original point of the patches was that LOCK_FILE and > LOCK_LINE are a bad answer to a wrong problem. > > `__FILE__, __LINE__' and the bloat introduced is not the problem, > `const char *file, int line' in way too much prototypes is. > > Now, you make me realize that `const char *file, int line' should just > be removed, not replaced by `unsigned long' or anything else. It's > likely to be done in another iteration. > >> just require you to be able to see the console. and have sources nearby. >> if you need the IP use gdb. >> > "console debugging" is yet another abomination which should be hunted > down. Just try to do any useful work at high-pps on a serial > console... > >> it's just what you are used to. You are obviously from the dark side >> ^H^H^H^H^H^H linux. >> > My obedience is totally irrelevant to the problem. > > However, if you want to know, my heart tends to be with BSDs. > Unfortunately, it's a sad love-story where your Beloved keeps > deceiving you day after day. You want to change small bits at a time, > make several iteration of progress to make things brighter, but your > Beloved refuses any change because of too much inertia. Sad. > >> so you are used to doing it that way.. but don't expect us to change just >> because that's what Linux does. >> > again, mentioning Linux is totally irrelevant. Use of Instruction > Pointer are implementation details for a not so intrusive solution to > the problem I pointed out, and which you are totally missing. > > Now, please answer this: do you find any of the bloat to the non-debug > case (ie. passing a NULL pointer and a 0 integer, when `LOCK_DEBUG == > 0') worth the extra debugability comfort to be acceptable ? > > If you do, then your focus is on making things comfortable for > developers, at the expense 100's of users, rather than making things > comfortable for 100's of users, at the expense of developers. > >> When we have a problem at work on teh Linux driver, my first step is >> always >> to try duplicate it on FreeBSD because: >> > well, you're lucky FreeBSD supports your device! Lately, we got lately > a shiny multi-queue network cards with bypass mechanism... that is not > supported in FreeBSD. So currently, we got an expensive paper-weight. > >> 1/ half the time freebsd will just immediatly assert on something and >> present you with the bug.. done. >> > well, certainly not from a release build; assertion are disabled. > >> 2/ I can run gdb through firewire on it on ANY standard unmodified kernel >> and find it, where on Linux I need to >> get a whole universe of stupid patches all aligned and MAYBE I might be >> able >> to see what is going on. >> if it's on redhat I need to do this, on ubuntu that, on suse something >> else >> ,and on different revisions >> of the kernel it all changes anyhow.. >> > machine (even x86-64 machines) I run FreeBSD on have no firewire, > neither do my desktops, this limit the usability of the feature. > Moreover, I do not use mass-distros either, except for desktop[3], but > small embedded firmware (ala. openwrt), even on "middle-end" system, > and stick to vanilla kernel (when not playing with -next). > >>> That said, IP address are barely used in FreeBSD, there is no legacy. >>> As such, the API should not use `unsigned long' but `void *'[0]; this >>> is the natural type returned by `__builtin_return_address()' and the >>> `&&' operator. This would allow to introduce a modifier to `%p' to do >>> the translation. >> >> possibly intptr_t is what should be used. but I'd expect Bruce to drop in >> here and let us us know. >> > whatever will suit you :) I started by using `uintptr_t', but they > cannot be printed in a portable manner and the printf(9) stuff was not > ready yet. > > - Arnaud > > [0]: I am able to crash any kernel between 7-STABLE to 9.STABLE within > minutes, with the right pattern and (mainstream and well supported) > hardware. Maybe this bug related to sk(4) driver? > > [1]: for example, when I read in `sys/kern/kern_fail.c': > > * Failpoints allow for injecting fake errors into running code on the fly, > * without modifying code or recompiling with flags. Failpoints are always > * present, and are very efficient when disabled. > > and see that "very efficient when disabled" translates into a > __predict_false() conditional, well, sorry, but I am very dubious. > Entering the branch _is_ still a cost. A most efficient way of doing > this would be using gcc's `asm goto' statement, in which case the hot > path only ever see a `nop', eventually patched at runtime. > > [2]: soekris-like boards do not qualify for "embedded" :) > > [3]: FreeBSD (8-STABLE) is way to limited and un-integrated to be > anywhere but useful, not to speak about kernel bug which leave the > system so fracked up that you have no other choice but to hard-reboot. > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org" >Received on Wed Nov 09 2011 - 13:14:05 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:20 UTC