Q&A on textdumps

From: Robert Watson <rwatson_at_FreeBSD.org> Date: Sun, 30 Dec 2007 13:11:29 +0000 (GMT) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:24 UTC

Dear all,

I've received a few textdump-related questions that I thought I'd share my 
answers to.

(1) What information is in a textdump?

The textdump is stored as a tarfile with several subfiles in it:

config.txt - Kernel configuration, if compiled into kernel
ddb.txt - Captured DDB output, if present
msgbuf.txt - Kernel message buffer
panic.txt - Kernel panic message, if there was a panic
version.txt - Kernel version string

It is easy to add new files to textdumps, so if there's some easily 
extractable kernel state that you feel should go in there, drop me an e-mail 
and/or send a patch.

(2) Is there any information in a textdump that can't be acquired using kgdb 
and other available dump analysis tools?

In principle no, as normal dumps include all kernel memory, and textdumps 
operate by inspecting kernel memory using DDB, capturing only small but 
presumably relevant parts.  However, there are some important differences in 
approach that mean that textdumps can be used in ways that regular dumps can't 
easily be:

- DDB textdumps are very small. Including a full debugging session, kernel 
message buffer, and kernel configuration, my textdumps are frequently around 
100k uncompressed. This makes it possible to use them on very small machines, 
store them for an extended period, e-mail them around, etc, in a way that you 
can't currently do with kernel memory dumps. This improved usability will 
(hopefully) improve our bug and crash management.

- DDB is a specialized debugging tool with intimate knowledge of the kernel, 
and there are types of data trivially extracted with DDB that are awkward or 
quite difficult to extract using kgdb or other currently available dump 
analysis tools. Locking, waiting, and process information are examples of 
where automatic extraction is currently only possible with DDB, and one of the 
reasons many developers prefer to begin any diagnosis with an interactive DDB 
session.

- DDB textdumps can be used without the exact source tree, kernel 
configuration, built kernel, and debug symbols, as they interpret rather than 
save the pages of memory. They're even an architecture-independent file format 
so you don't need a cross-debugger. Having that additional context is useful 
(ability to map symbol+offset to line of code), but you can actually go a 
remarkable way without it, especially looking at the results in a PR 
potentially years later.

(3) What do I lose by using textdumps?

To be clear, there are also some important things that textdumps can't do -- 
principally, a textdump doesn't contain all kernel memory, so your textdump 
output is all you have. If you need to extract detailed structure information 
for something DDB doesn't understand, or that you don't think of in advance or 
during a DDB session, then there's nothing to fall back on except configuring 
a textdump or regular dump and waiting for the panic to happen again.

(4) When should I use textdumps?

Minidumps remain the default in 7.x and 8.x, and full dumps remain the default 
in 6.x and earlier. Textdumps must be specifically enabled by the 
administrator to be used.

DDB is an excellent live debugging tool whose use has been limited to 
situations where there is an accessible video console, or more ideally serial 
or firewire console to a second box, and generally requiring an experienced 
developer to be available to drive debugging. There are many problems that can 
be pretty much instantly understood with a couple of DDB commands, so these 
limitations impacted debugging effectiveness.

The goal of adding DDB capture output, scripting, and textdumps was to broaden 
the range of situations in which DDB could be used: now it is usable more 
easily for post-mortem analysis, no console or second machine is required, and 
a developer can install, or even e-mail, a script of DDB commands to run 
automatically. Developers can simply define a few scripts to handle various 
DDB cases, such as panic, and get a nice debugging bundle to look at later.

When I'm debugging network stack problems, I typically want a fairly small set 
of DDB commands to be run by the user, and the output sent back, and now it 
will go from "Read the chapter on kernel debugging, set up a serial console, 
run the following commands, copy and paste from your serial console -- oh, you 
don't have a serial console, perhaps hand-copy these fields or use a digital 
camera" to "run the following ddb(8) command and when the box reboots, send me 
the tarball in /var/crash".

I anticipate that textdumps will see use when developers are exchanging e-mail 
with users reporting problems and trying to gather concise summaries of 
information about a crash with minimum downtime and maximum portability, in 
embedded environments where dumping kernel memory to flash is tricky, or in 
order to save a transcript of an interactive DDB session when testing new 
features locally.

Another interesting advantage of textdumps is that it's easy to inspect them 
for confidential/identifying information and mask or purge it. When someone 
sends out a kernel memory dump, it potentially contains a lot of sensitive 
information, and most people (including me) would have difficulty making sure 
all sensitive information was purged safely.

(5) I want to collect DDB output, but still need memory dumps -- can I do 
both?

Yes and no.

Yes, you can use the DDB output capture buffer and scripting without using a 
textdump, as the capture buffer is stored in kernel memory. You can print it 
using kgdb, and we should probably add that capability to ddb(8) also. End 
your script with "call doadump; reset" but don't "textdump set". For example:

   ddb script kdb.enter.panic="capture on;show pcpu;trace;ps;show 
locks;alltrace;show alllocks;show lockedvnods;call doadump;reset"

No, because you must pick one of the three dump layouts (dump, minidump, 
textdump) to write to the swap partition -- you can't write out all three and 
then decide which to extract later. In principle this could be changed so that 
we actually write out a textdump section and a full/minidump, but that's not 
implemented.

(6) I have a serial console so don't need textudmps, can I still use DDB 
scripting to manage a crash?

Yes. You can set up scripts in exactly the same way as with textdumps, only 
omit the textdump bits and end with a "reset" to reboot the system when done. 
That way you can extract the results from the serial console log. I.e.,

   ddb script kdb.enter.panic="show pcpu;trace;show locks;ps;alltrace;show 
alllocks;show lockedvnods;reset"

(7) I'm in DDB and I suddenly realize I want to save the output, and I haven't 
configured textdumps. What do I do?

As with normal dumps, you must previously have configured support for a dump 
partition. These days, that is done automatically whenever you have swap 
configured on the box, so unless you're in single-user mode or don't have swap 
configured, you should be able to do the following:

Schedule a textdump using the "textdump set" command.

Turn on DDB output capture using "capture on", run your commands of interest, 
and turn it off using "capture off".

Type "call doadump" to dump memory, and "reset" to reboot.

(8) The buffer is small, can I pick and choose what DDB output is captured?

The capture buffer does have a size limit, so you might find you want to 
explore interactively at first to figure out what information to save. Then 
you can turn it on and off around output to capture with "capture on" and 
"capture off". Each time you turn capture back on, new output is appended 
after any existing output.

If you decide you want to clear the buffer, you can use "capture reset" to do 
that, and you can check the status of the buffer using "capture status".

You can also increase the buffer size by setting the debug.ddb.capture.bufsize 
sysctl to a larger size.  The sysctl will automatically round up to the next 
textdump blocksize.

(9) Can I continue the kernel after doing a textdump?

No. As with kernel memory dumps, textdumps invoke the storage controller 
dumper routine, which may hose up state in the device driver preventing its 
use after the dump is generated.

However, if you do plan to continue from DDB, just use DDB output capture 
without a textdump. You can then extract the contents of the DDB buffer using 
the debug.ddb.capture.data sysctl.