On 9/4/18 21:39, Conrad Meyer wrote: > With current libc, I instead see: > > load: 0.10 cmd: blocked_random_poc 1668 [randseed] 1.27r 0.00u 0.00s > 0% 2328k (SIGINFO) > > $ procstat -kk 1668 > PID TID COMM TDNAME KSTACK > 1668 100609 blocked_random_poc - mi_switch+0xd3 > sleepq_catch_signals+0x386 sleepq_timedwait_sig+0x12 _sleep+0x272 > read_random_uio+0xb3 sys_getrandom+0xa3 amd64_syscall+0x940 > fast_syscall_common+0x101 > > and: > > $ truss ./blocked_random_poc > ... > getrandom(0x7fffffffd340,40,0) ERR#35 'Resource > temporarily unavailable' > thr_self(0x7fffffffd310) = 0 (0x0) > thr_kill(100609,SIGKILL) = 0 (0x0) > SIGNAL 9 (SIGKILL) code=SI_NOINFO > > So getrandom(2) (via READ_RANDOM_UIO) is returning a bogus EAGAIN > after we have already slept until random was seeded. This bubbles up > to getentropy(3) -> arc4random(3), which sees a surprising failure > from getentropy(3) and raises KILL against the program. > > I believe the EWOULDBLOCK is just a boring leak of tsleep(9)'s timeout > condition. This may be sufficient to fix the problem: > > --- a/sys/dev/random/randomdev.c > +++ b/sys/dev/random/randomdev.c > _at__at_ -156,6 +156,10 _at__at_ READ_RANDOM_UIO(struct uio *uio, bool nonblock) > error = tsleep(&random_alg_context, PCATCH, "randseed", hz/10); > if (error == ERESTART || error == EINTR) > break; > + /* Squash hz/10 timeout condition */ > + if (error == EWOULDBLOCK) > + error = 0; > + KASSERT(error == 0, ("unexpected %d", error)); > } > if (error == 0) { > read_rate_increment((uio->uio_resid + > sizeof(uint32_t))/sizeof(uint32_t)); +markm, re I think the proposed change is reasonable (note that I think the same theory applies to the tsleep_sbt() case below as well, which should be handled similarly). > Best, > Conrad > > > On Tue, Sep 4, 2018 at 8:13 PM, Conrad Meyer <cem_at_freebsd.org> wrote: >> Hi Lev, >> >> I took a first attempt at reproducing this problem on a fast >> desktop-class system. First steps, give us a way to revert back to >> unseeded status: >> >> --- a/sys/dev/random/fortuna.c >> +++ b/sys/dev/random/fortuna.c >> _at__at_ -39,6 +39,7 _at__at_ __FBSDID("$FreeBSD$"); >> >> #ifdef _KERNEL >> #include <sys/param.h> >> +#include <sys/fail.h> >> #include <sys/kernel.h> >> #include <sys/lock.h> >> #include <sys/malloc.h> >> _at__at_ -384,6 +385,17 _at__at_ random_fortuna_pre_read(void) >> return; >> } >> >> + /* >> + * When set, pretend we do not have enough entropy to reseed yet. >> + */ >> + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_pre_read, { >> + if (RETURN_VALUE != 0) { >> + RANDOM_RESEED_UNLOCK(); >> + return; >> + } >> + }); >> + >> + >> #ifdef _KERNEL >> fortuna_state.fs_lasttime = now; >> #endif >> _at__at_ -442,5 +454,11 _at__at_ bool >> random_fortuna_seeded(void) >> { >> >> + /* When set, act as if we are not seeded. */ >> + KFAIL_POINT_CODE(DEBUG_FP, random_fortuna_seeded, { >> + if (RETURN_VALUE != 0) >> + fortuna_state.fs_counter = UINT128_ZERO; >> + }); >> + >> return (!uint128_is_zero(fortuna_state.fs_counter)); >> } >> >> >> Second step, enable the failpoints and launch repro program: >> >> $ sudo sysctl debug.fail_point.random_fortuna_pre_read='return(1)' >> debug.fail_point.random_fortuna_pre_read: off -> return(1) >> $ sudo sysctl debug.fail_point.random_fortuna_seeded='return(1)' >> debug.fail_point.random_fortuna_seeded: off -> return(1) >> >> $ cat ./blocked_random_poc.c >> #include <stdio.h> >> #include <stdlib.h> >> #include <unistd.h> >> >> int >> main(int argc, char **argv) >> { >> printf("%x\n", arc4random()); >> return (0); >> } >> >> >> $ ./blocked_random_poc >> ... >> >> >> Third step, I looked at what that process was doing: >> >> Curiously, it is not in getrandom() at all, but instead the ARND >> sysctl fallback. I probably need to rebuild world (libc) to test this >> (new libc arc4random based on Chacha). >> >> $ procstat -kk 1196 >> PID TID COMM TDNAME KSTACK >> 1196 100435 blocked_random_poc - read_random+0x3d >> sysctl_kern_arnd+0x3a sysctl_root_handler_locked+0x89 >> sysctl_root.isra.8+0x167 userland_sysctl+0x126 sys___sysctl+0x7b >> amd64_syscall+0x940 fast_syscall_common+0x101 >> >> >> When I unblocked the failpoints, it completed successfully: >> >> $ sudo sysctl debug.fail_point.random_fortuna_pre_read='off' >> debug.fail_point.random_fortuna_pre_read: return(1) -> off >> $ sudo sysctl debug.fail_point.random_fortuna_seeded=off >> debug.fail_point.random_fortuna_seeded: return(1) -> off >> >> ... >> 9e5eb30f >> >> >> Best, >> Conrad
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:18 UTC