Re: A reliable port cross-build failure (hangup) in my context (amd64->armv7 cross build, with native-tool speedup involved)

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 4 Jan 2019 01:17:52 -0800
On 2019-Jan-3, at 22:56, Michal Meloun <melounmichal at gmail.com> wrote:

> On 29.12.2018 18:47, Dennis Clarke wrote:
>> On 12/28/18 9:56 PM, Mark Millard via freebsd-arm wrote:
>>> 
>>> On 2018-Dec-28, at 12:12, Mark Millard <marklmi at yahoo.com> wrote:
>>> 
>>>> On 2018-Dec-28, at 05:13, Michal Meloun <melounmichal at gmail.com>
>>>> wrote:
>>>> 
>>>>> Mark,
>>>>> this is known problem with qemu-user-static.
>>>>> Emulation of every single interruptible syscall is broken by design (it
>>>>> have signal related races). Theses races cannot be solved without major
>>>>> rewrite of syscall emulation code.
>>>>> Unfortunately, nobody actively works on this, I think.
>>>>> 
>> 
>> Following along here quietly and I had to blink at this a few times.
>> Is there a bug report somewhere within the qemu world related to this
>>  'broken by design' qemu feature?
> 
> Firstly, I apologize for late answer. Writing a technically accurate but
> still comprehensible report is extremely difficult for me.

Thanks for doing so.

> . . .
> Mark, I hope that this is also the answer to your question posted to
> hackers_at_ and also the exploitation why you see hang.

Again thanks: it was helpful for my gaining some understanding of
the code structure.

But it turns out that another of your list of problems is involved
in the hang-up:

> . . .
> - and last major one. At this time, all guest structures are maintained
> by hand. Due to huge amount of these structures, this is the extreme
> error prone approach.  We should convert this to script generated code,
> including guest syscalls definition.

It turns out that "struct target_cmsghdr" has the wrong overall size,
the wrong first field size, and the wrong offsets for later fields
for amd64->aarch64 use (or likely any 64-bit->64-bit host-target
pair, even amd64->x86_64). In fact the code reports via:

          gemu_log("Unsupported ancillary data: %d/%d\n",
              cmsg->cmsg_level, cmsg->cmsg_type);


because of msg->cmsg_level and cmsg->cmsg_type ending up with
messed up values. It hangs after that message shows up. The
more complete code containing that qemu_log call is:

      if ((cmsg->cmsg_level == TARGET_SOL_SOCKET) &&
          (cmsg->cmsg_type == SCM_RIGHTS)) {
          int *fd = (int *)data;
          int *target_fd = (int *)target_data;
          int i, numfds = len / sizeof(int);

          for (i = 0; i < numfds; i++) {
              fd[i] = tswap32(target_fd[i]);
          }
      } else if ((cmsg->cmsg_level == TARGET_SOL_SOCKET) &&
          (cmsg->cmsg_type == SCM_TIMESTAMP) &&
          (len == sizeof(struct timeval)))  {
          /* copy struct timeval to host */
          struct timeval *tv = (struct timeval *)data;
          struct target_freebsd_timeval *target_tv =
              (struct target_freebsd_timeval *)target_data;
          __get_user(tv->tv_sec, &target_tv->tv_sec);
          __get_user(tv->tv_usec, &target_tv->tv_usec);
      } else {
          gemu_log("Unsupported ancillary data: %d/%d\n",
              cmsg->cmsg_level, cmsg->cmsg_type);
          memcpy(data, target_data, len);
      }

Of 3 types of hangups that I've run into recently, one was from a
missing statement, one was from struct target_kevent having the
wrong overall size and wrong field offsets after the first field
(amd64->armv7 was an example), and the one involving struct
target_cmsghdr above. (There may be more to the target_cmsghdr
one.)

> Again, my apology for slightly (or much) chaotic report, but this is the
> best what's I capable.

Not chaotic in my view.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Fri Jan 04 2019 - 08:18:06 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:19 UTC