Re: boot broken on VMWare somewhere between r300069 and r300176

From: O. Hartmann <ohartman_at_zedat.fu-berlin.de>
Date: Thu, 19 May 2016 19:04:12 +0200
On Thu, 19 May 2016 14:04:59 +0300
Andriy Gapon <avg_at_FreeBSD.org> wrote:

> On 19/05/2016 13:50, Boris Samorodov wrote:
> > 19.05.16 09:28, K. Macy пишет:  
> >> I did an IFC on my drm-next-4.6 branch yesterday at r300069. I just
> >> did an IFC to r300176 and boot will hang right ater printing out
> >> "setting hostid: ". ^T just shows sh [piperd]. ddb just shows the
> >> shell as hanging in piperead. Diffing between those two revisions I
> >> don't see any obvious offenders so I'm hoping that individuals who
> >> have committed in the last 24 hours will have some idea of their
> >> changes having such an impact.  
> > 
> > For me (BIOS boot at DELL notebook) is broken after jump
> > from r300062 to r300158. CapsLock works, but ^T shows nothing.
> > Here is a photo (sorry for the quality):
> > ftp://ftp.wart.ru/pub/misc/boot_broken.jpg
> > 
> > Boot with r300062 works fine.  
> 
> A wild guess (not really), try to revert r300113
> 

We updated several systems of different ages and CPU generations
(around 10) to 300158. Bare metal. The systems all failed to boot, they
got stuck after the USB system has been probed (according to the kernel
messages). Some boxes get stuck after the message of the generation of
the UUID occured. Pushing the Powerbutton performs a clean shutdown,
although - so te system seems still alive, but usually the power is
turning off - this time, the box is stuck with the uptime message.

On random reboots some of the boxes boot. But the desaster then starts.
The network is highly unstable and flaky - while I can ping hosts or
resolve their IP by a DNS, I can not login via ssh, the webservices of
the webserver of the machines in question are inaccessible as well as
their databases (PostgreSQL) as well as ssh.

And it is more frustrating: I can't update or go back with svn
(either /usr/bin/svn or /usr/local/bin/svn) within the sources to
avoid this mess. In all cases, svn "times out".

Accessing the web from clients with the broken CURRENT code also ends
up in a wild guess game: sometimes the connection to services can be
established, sometimes not and I see a timeout. With svn
in /usr/src, on one box I could obtain a poor fragment of the code via
"svn update -r 300005" (300005 was in my case the starting point when
everything was up an running and working).

In short words: reverting back to r300113 isn't possible on the most
systems!

This problem has been present immediately after 300158 has been
introduced and build-world/build-kernel has been performed and the
fact, that different hardware, including NICs, has been affected, does
not narrow down the problem to a specific NIC, CPU type or hardware.
And that leads me to the question whether the code injected into
CURRENT gets tested - or not. If there would be a test, I guess the
problem would have revealed itself immediately.

I boot via UEFI as well as BIOS - the problem is with both.

In such a case as described with a nonworking svn, how am I supposed
to revert to the supposedly working revision r300113?

Kind regards and thanks in advance for your suggestions,

O. Hartmann 


P.S.

I'm using IPFW on all systems. Disabling IPFW (ipfw disable firewall) seems to releafe
the symptoms a bit - no matter whether custom scripts were used or the settings from
rc.conf.

Received on Thu May 19 2016 - 15:02:23 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:05 UTC