Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

From: Jilles Tjoelker <jilles_at_stack.nl>
Date: Sun, 6 Jan 2019 00:27:20 +0100
On Fri, Jan 04, 2019 at 07:56:42AM +0100, Michal Meloun wrote:
> On 29.12.2018 18:47, Dennis Clarke wrote:
> > On 12/28/18 9:56 PM, Mark Millard via freebsd-arm wrote:
> >>
> >> On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote:
> >>
> >>> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com>
> >>> wrote:
> >>>
> >>>> Mark,
> >>>> this is known problem with qemu-user-static.
> >>>> Emulation of every single interruptible syscall is broken by design (it
> >>>> have signal related races). Theses races cannot be solved without major
> >>>> rewrite of syscall emulation code.
> >>>> Unfortunately, nobody actively works on this, I think.
> >>>>

> > Following along here quietly and I had to blink at this a few times.
> > Is there a bug report somewhere within the qemu world related to this
> >  'broken by design' qemu feature?

> Firstly, I apologize for late answer. Writing a technically accurate but
> still comprehensible report is extremely difficult for me.

> Major design issue with qemu-user is the fact that guest (blocking /
> interruptible) syscalls must be emulated atomically, including
> delivering of asynchronous signals (including signals originated by
> other thread).
> This is something that cannot be emulated precisely by user mode
> program, without specific kernel support. Let me explain this in a
> little more details.

> [snip]

> This look a much better. The code blocks all signals first, then checks
> if any signal is pending. If yes, then does not-blocking select()
> (because timeout is zero) and correctly returns EINTR immediately.
> Otherwise, it uses other variant of select(), pselect() which adjusts
> right signal mask itself.
> That's mean that syscall is called with blocked signal delivery, but
> kernel adjusts right sigmask before it waits for event. While this looks
> like perfect solution and this code closes all races from first version,
> then it doesn't. pselect() uses different semantic that select(), it
> doesn't update timeout argument. So this solution is also inappropriate.

FreeBSD select() never updates the passed timeout. When emulating Linux
syscalls, this will have to be done manually.

> Moreover, I think, we don't have p<foo> equivalents for all blocking
> syscalls.

We definitely do not. For example, open() has no equivalent with a
signal mask.

> Mark, I hope that this is also the answer to your question posted to
> hackers_at_ and also the exploitation why you see hang.

> Linux uses different approach to overcome this issue, safe_syscall ->
> https://gitlab.collabora.com/tomeu/qemu/commit/4d330cee37a21aabfc619a1948953559e66951a4
> It looks like workable workaround, but I'm not sure about ERESTART
> versus EINTR return values. Imho, this can be problem.

This looks like a reasonable solution. Musl libc uses the same approach
to implement pthread cancellation (where with the default "deferred"
cancellation type, cancellation takes effect at cancellation points
only, which include most blocking system calls; if a cancellation
request comes in at the same time as a blocking cancellation point
system call starts, the same race condition needs to be avoided).

As for ERESTART vs EINTR, EINTR can be treated like any other error. On
the other hand, ERESTART (or variants like ERESTARTSYS) is never
returned by the kernel, but instead causes the kernel to rewind the
program counter (so the system call instruction will be executed again)
just before invoking the signal handler. Therefore, when the host kernel
does this to qemu, qemu must do the same to the guest.

If a signal is delivered just before qemu makes a system call on behalf
of the guest, this may look like ERESTART. This is fine since it looks
the same as if the signal was delivered just before the guest's system
call instruction.

The approach as used by FreeBSD libc to implement pthread cancellation
(thr_wake(2) on self in the signal handler) will not let you distinguish
between ERESTART and EINTR, so you would have to replicate that
determination (which typically but not always depends on the signal's
SA_RESTART flag and which system call it is). Therefore, I would not
recommend that approach.

> I have list of other qemu-user problems (I mean mainly a bsd-user part
> of qemu code here), not counting normal coding bugs:
> - code is not thread safety but is used in threaded environment (rw
> locks for example),
> - emulate  some sysctl's and resource limits / usage behavior is very
> hard  (mainly if we emulate 32-bits guest on 64-bits host)

In many such cases, the proper behaviour can be found in the kernel code
(when a 64-bit kernel needs to handle a system call from a 32-bit
process).

I expect problems with getdirentries() and struct dirent.d_off with
filesystems that return hashed filenames as positions.

> - if host syscall returns ERESTART, we should do full unroll and pass it
> to guest.

Yes (with the above mentioned caveats about how ERESTART is returned).

> - the syscalls emulation should not use the libc functions, but syscall
> instruction directly. Libc shims can have side effects so we should not
> to execute it twice. Once in guest, second time in host.

If you accept that your code is going to be more tightly coupled to libc
and the kernel than most applications, calling system calls directly
should be fine. This will also allow you to install your own handler for
SIGTHR if you do not want to remap it. Do not expect pthread
cancellation and suspension to work properly in such a configuration,
though.

> - and last major one. At this time, all guest structures are maintained
> by hand. Due to huge amount of these structures, this is the extreme
> error prone approach.  We should convert this to script generated code,
> including guest syscalls definition.

Definitions of system calls are in syscalls.master and should be
automatically processable; definitions of types are in header files and
cannot really be processed other than by a C compiler.

> Again, my apology for slightly (or much) chaotic report, but this is the
> best what's I capable.

It was clear enough to me.

-- 
Jilles Tjoelker
Received on Sat Jan 05 2019 - 22:27:44 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC