On Thu, 6 Aug 2009, Larry Rosenman wrote: > On Thu, 6 Aug 2009, Robert Watson wrote: > >> On Tue, 4 Aug 2009, Navdeep Parhar wrote: >> >>>>> This occurs on today's HEAD + some unrelated patches. That makes it >>>>> 8.0BETA2+ code. I haven't tried older builds. >>>> >>>> We have finally been able to reproduce this ourselves yesterday and >>> >>> Well, it happens every single time on all of my amd64 machines. After I'd >>> already sent my email I noticed that the netisr mutex has an odd address >>> (pun intended :-)) >>> >>> m=0xffffffff8144d867 >> >> Heh, indeed. We just spotted the same result here. In this case it's >> causing a panic because it leads to a non-atomic read due to mtx_lock >> spanning a cache line boundary, followed shortly by a panic because it's >> not a valid thread pointer when it's dereferenced, as we get a fractional >> pointer. > [snip] > > Do we have an ETA for a testable patch? RSN, I'm afraid. We can eliminate the effect by reverting the use of DPCPU in netisr.c (basically reverting to pre-r195019 of netisr.c). The interesting question is where the problem originates -- is gcc/ld/etc not laying out the elf section properly, or are the MD parts not providing an aligned base? There are also probably issues in the DPCPU handling of modules along similar lines, but first things first. We'll be adding assertions of alignment to the various lock init functions to catch this happening explicitly in the future. There are probably one or two other places where we have very strong alignment requirements on i386/amd64, such as the td_ucred pointer that we check for change on system calls/traps to see if we need to refresh the thread's credential from the process credential. Robert N M Watson Computer Laboratory University of CambridgeReceived on Thu Aug 06 2009 - 12:11:27 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:53 UTC