Re: minidumps are unsafe on amd64

From: Scott Long <scottl_at_samsco.org> Date: Fri, 25 Jan 2008 11:54:22 -0700 · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:26 UTC

Ruslan Ermilov wrote:
> Hi,
> 
> Kernel minidumps on amd64 SMP can write beyond the bounds
> of the configured dump device causing (as in our case) the
> file system data following swap partition to be overwritten
> with the dump contents.
> 
> The problem is that while we're in the process of dumping
> mapped physical pages via a bitmap (in minidump_machdep.c),
> other CPUs continue to work and may modify page mappings of
> processes.  This in turn causes the modifications to
> pv_entries, which in turn modifies the bitmap of pages to
> dump.  As the result, we can dump more pages than we've
> calculated, and since dumps are written to the end of the
> dump device, we may end up overwriting it.
> 
> The attached patch mitigates the problem, but the real solution
> seems to be to disable interrupts (there's an XXX about this
> in kern_shutdown.c before calling doadump()), and stopping
> other CPUs, so we don't modify page tables while we're dumping.
> 
> This only affects 7.x/8.x amd64 SMP systems configured with
> minidump.  i386 systems aren't affected.
> 

Is this a case where you are manually triggering a dump on a
system that is otherwise running fine?  I thought that crashes
already disabled interrupts and made an attempt to stop other
CPUs.  That's why there is dump-specific code in every storage
driver in the first place; it implements polled i/o so that
crashdump i/o can take place with interrupts disabled.  If it's
a case where interrupts aren't actually getting disabled, then
that's one thing.  If it's a case where you're trying to fix
something that isn't broken, then I'm very cautious about the
added complexity that you're proposing.

Scott