Ruslan Ermilov wrote: > Hi, > > Kernel minidumps on amd64 SMP can write beyond the bounds > of the configured dump device causing (as in our case) the > file system data following swap partition to be overwritten > with the dump contents. > > The problem is that while we're in the process of dumping > mapped physical pages via a bitmap (in minidump_machdep.c), > other CPUs continue to work and may modify page mappings of > processes. This in turn causes the modifications to > pv_entries, which in turn modifies the bitmap of pages to > dump. As the result, we can dump more pages than we've > calculated, and since dumps are written to the end of the > dump device, we may end up overwriting it. > > The attached patch mitigates the problem, but the real solution > seems to be to disable interrupts (there's an XXX about this > in kern_shutdown.c before calling doadump()), and stopping > other CPUs, so we don't modify page tables while we're dumping. > > This only affects 7.x/8.x amd64 SMP systems configured with > minidump. i386 systems aren't affected. > Is this a case where you are manually triggering a dump on a system that is otherwise running fine? I thought that crashes already disabled interrupts and made an attempt to stop other CPUs. That's why there is dump-specific code in every storage driver in the first place; it implements polled i/o so that crashdump i/o can take place with interrupts disabled. If it's a case where interrupts aren't actually getting disabled, then that's one thing. If it's a case where you're trying to fix something that isn't broken, then I'm very cautious about the added complexity that you're proposing. ScottReceived on Fri Jan 25 2008 - 18:23:19 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:26 UTC