Re: signal 8 (floating point exception) upon resume

From: Bruce Evans <brde_at_optusnet.com.au> Date: Wed, 5 Mar 2014 08:50:01 +1100 (EST) · This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:47 UTC

On Tue, 4 Mar 2014, John Baldwin wrote:

> On Monday, March 03, 2014 6:49:08 pm Adrian Chadd wrote:
>> I'll try this soon.
>>
>> I had it fail back to newcons, rather than Xorg normally dying without
>> restoring state. It wouldn't let me spawn a shell. Logging in worked
>> fine, but normal shell exec would eventually and quickly lead to
>> failure, dropping me back to the login prompt.
>
> If you have set CPUTYPE in /etc/src.conf such that your userland binaries
> are built with SSE, etc. then I expect most things to break because the FPU
> is in a funky state without this patch.  I suspect if you don't set CPUTYPE
> so that your userland binaries do not use the FPU, you can probably resume
> just fine without this fix.

Non-SSE FPU state might be broken too.

>>> Complete stab in the dark (not compile tested) here:
>>>
>>> http://www.FreeBSD.org/~jhb/patches/i386_fpu_suspend.patch

I forget many details of how this works, but noticed that it seems
to break consistency of the state for the !fxsave case and related
locking.

% Index: i386/i386/swtch.s
% ===================================================================
% --- i386/i386/swtch.s	(revision 262711)
% +++ i386/i386/swtch.s	(working copy)
% _at__at_ -417,42 +417,9 _at__at_
%  	str	PCB_TR(%ecx)
% 
%  #ifdef DEV_NPX
% -	/*
% -	 * If fpcurthread == NULL, then the npx h/w state is irrelevant and the
% -	 * state had better already be in the pcb.  This is true for forks
% -	 * but not for dumps (the old book-keeping with FP flags in the pcb
% -	 * always lost for dumps because the dump pcb has 0 flags).
% -	 *
% -	 * If fpcurthread != NULL, then we have to save the npx h/w state to
% -	 * fpcurthread's pcb and copy it to the requested pcb, or save to the
% -	 * requested pcb and reload.  Copying is easier because we would
% -	 * have to handle h/w bugs for reloading.  We used to lose the
% -	 * parent's npx state for forks by forgetting to reload.
% -	 */

This function is mostly bogus (see old mails).

% -	pushfl
% -	CLI
% -	movl	PCPU(FPCURTHREAD),%eax
% -	testl	%eax,%eax
% -	je	1f

This CLI/STI locking is bogus.  Accesses to FPCURTHREAD are now locked
by critical_enter(), as on amd64, and perhaps a higher level already
did critical_enter() or even CLI.

(CLI/STI in swtch.s seems to be bogus too.  amd64 doesn't do it, and
I think a higher level does mtx_lock_spin() which does too much, including
CLI via spinlock_enter().)

% -
% -	pushl	%ecx
% -	movl	TD_PCB(%eax),%eax
% -	movl	PCB_SAVEFPU(%eax),%eax
% -	pushl	%eax
% -	pushl	%eax
% -	call	npxsave
% +	pushl	PCB_FPUSUSPEND(%ecx)
% +	call	npxsuspend

Without fxsave, npxsuspend() cannot be atomic without locking, since
fnsave destroys the state in the FPU and you either need a lock to
reload the old state atomically enough, or a lock to modify FPCURTHREAD
atomically enough.  Reloading the old state is problematic because
the reload might trap.  So the old version uses the second method.
It calls npxsave() to handle most of the details.  But npxsave() was
designed to be efficient for its usual use in cpu_switch(), so it doesn't
handle the detail of checking FPCURTHREAD or the locking needed for this
check, so the above code had to handle these details.

%  	addl	$4,%esp
% -	popl	%eax
% -	popl	%ecx
% -
% -	pushl	$PCB_SAVEFPU_SIZE
% -	leal	PCB_USERFPU(%ecx),%ecx
% -	pushl	%ecx
% -	pushl	%eax
% -	call	bcopy
% -	addl	$12,%esp
% -1:
% -	popfl
%  #endif	/* DEV_NPX */

This probably should never have been written in asm.  Only the similar
code in cpu_switch() is time-critical.

% 
%  	movl	$1,%eax
% ...
% _at__at_ -520,7 +490,16 _at__at_
%  	movl	%eax,%dr7
% 
%  #ifdef DEV_NPX
% -	/* XXX FIX ME */
% +	/* Restore FPU state */

Is the problem just this missing functionality?

% ...
% Index: i386/isa/npx.c
% ===================================================================
% --- i386/isa/npx.c	(revision 262711)
% +++ i386/isa/npx.c	(working copy)

This has many vestiges of support for interrupt handling (mainly in
comments and in complications in the probe).  CLI/STI was used for
locking partly to reduce complications for the IRQ13 case.  The
comment before npxsave() still says that it needs CLI/STI locking
by callers, but it actually needs critical_enter() locking and
most callers only provided that.

% _at__at_ -761,7 +761,34 _at__at_
%  	PCPU_SET(fpcurthread, NULL);
%  }
% 
% +/*
% + * Unconditionally save the current co-processor state across suspend and
% + * resume.
% + */
%  void
% +npxsuspend(union safefpu *addr)
% +{
% +	register_t cr0;
% +
% +	if (!hw_float)
% +		return;
% +	cr0 = rcr(0);
% +	clts();
% +	fpusave(addr);
% +	load_cr(0, cr0);
% +}

In the !fxsave case, this destroys the state in the npx, leaving
fpcurthread invalid.  It also does the save when the state in the
npx is inactive.  I think jkim intentionally this state so that
resume can load it unconditionally.  It must be arranged that there
are no interactions with fpcurthread.  This doesn't work so well
without fxsave.  When fpcurthread != NULL, reloading CR0 keeps
CR0_TS and thus ensures that inconsistent state lives for longer.
Things will only be OK if fpcurthread isn't changed until resume.

You can probably fix this by using the old code here.  The old code
doesn't need the hw_float test, since fpcurthread != NULL implies
hw_float != 0.

Actually, I don't see any need to change anything on i386 -- after
storing the state for the thread, there should be no need to store it
anywhere else across suspend/resume.  We intentionally use this method
(even on amd64 IIRC), although it is suboptimal, to reduce complications
for context switchres and signal handling.  npxsave() takes an address,
but savectx() didn't abuse this to store directly in the special save
area.  It made npxsave() store in the pcb, and then copied to the special
area.

% +
% +void
% +npxresume(union savefpu *addr)
% +{
% +
% +	if (!hw_float)
% +		return;
% +	fninit();
% +	fpurstor(addr);
% +}

The old version seems to have been almost correct in not restoring any
FPU state.  All the state has been saved in PCBs and none should be
restored.  Copying it was not useful for suspend/resume, but might be
useful for other things.

You need that fninit() to avoid exceptions in fpurstor() in case the
inactive but initially clean state in the npx was corrupted by
suspend/resume.

I see a problem in npxdna().  It doesn't clean the state in the npx
before fpurstor() since it "knows" that the current (inactive) state
is clean so it doesn't need to waste time cleaning it.  The state is
normally cleaned by fnsave for context switches (sometimes it is cleaned
using fnclex() by npxdrop() and friends, since these are perhaps
excessively optimized so they don't use the usual fnsave mechanism).
This knowledge is wrong if the state was corrupted by suspend/ resume.

Try replacing all of these i386 changes by just an fninit instruction
at the resume point.  This has a chance of helping even in the fxsave
case.  npxdna() has pessimizations to do quite different cleaning in
the fxsave case.  It calls fpu_clean_state() to work around bugfeatures
in some AMD CPUs, even on non-AMD CPUs.  This clears exceptions as a
side effect, but it doesn't do a full fninit and it is not clear than
fnclex is enough for a state corrupted by suspend/resume.  Without
this pessimization, npxdna() would only do the SSE instruction fxrestor,
and this doesn't need cleaning to use.

Suspend/resume doesn't seem to have any direct support for %mxcsr.  It
seems to depend on fxrstor to load a good one.  I think a running with
a bad one is harmless provided no SSE instructions are executed before
an fxrstor to load a good one.

x87 resume should probably repeat all the initialization except for the
probe (busy latch stuff ...).  Maybe booting doesn't require that either,
but if it does then resume might too.

% +
% +void
%  npxdrop()
%  {
%  	struct thread *td;

Bruce