On Thu, 26 Aug 2004, Craig Boston wrote: > On Sun, Aug 22, 2004 at 12:05:39PM -0700, Doug White wrote: > > If you have a reasonably fast i386 or amd64 multiprocessor and/or > > hyperthreading machine and are experiencing reproducible hangs during -j > > buildwords and other highly parallel operations, please try this patch: > > Just a follow-up to my off-list message and another data point, with > this patch I no longer get deadlocks, however I now get random data > corruption. Okay, for those of you experiencing the data corruption issue, I need to know the following: . cvsup date & time for the affect kernel(s) . branch you're tracking . revision of src/sys/kern/kern_lock.c - I'm checking for a specific set of commits here . reproduction case - applications involved and detailed description of the operation(s) involved. It would also be nice if you could set up a serial console and attempt to break into the debugger with an NMI, if your system is so equipped. You'll want to set these sysctls beforehand: machdep.panic_on_nmi=0 debug.kdb.stop_cpus=0 That should prevent the usual suspects from disrupting your entry to ddb. This usually works for me for getting into ddb in the IPI deadlock situation. If you are tracking RELENG_5, be aware the patch is NOT committed there, and cvsup will happily obliterate the changed files on next run. So be sure to reapply the patch after cvsup until the patch is merged, which should be Real Soon Now. > Disabling the second processor or falling back to an older kernel (one > from before the IPI hangs started) both fix the problem. My guess here is that there is another change that got masked by the IPI problems that are causing this, and getting SMP usable again has brought it into the light. -- Doug White | FreeBSD: The Power to Serve dwhite_at_gumbysoft.com | www.FreeBSD.orgReceived on Thu Aug 26 2004 - 16:18:34 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:08 UTC