Re: Boot failure: panic: No heap setup

From: Toomas Soome <tsoome_at_me.com>
Date: Fri, 30 Mar 2018 21:10:31 +0300
> On 30 Mar 2018, at 18:03, Stefan Esser <se_at_freebsd.org> wrote:
> 
> Am 29.03.18 um 07:15 schrieb Toomas Soome:
>> 
>> 
>>> On 29 Mar 2018, at 01:06, Stefan Esser <se_at_freebsd.org> wrote:
>>> 
>>> Am 28.03.18 um 22:28 schrieb Warner Losh:
>>>>> Hmmm, the code references point into the boot loader code - I had
>>>>> expected that there is a problem in the kernel, not the boot loader.
>>>>> 
>>>>>> [1]
>>>>>> https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56
>>>>   <https://svnweb.freebsd.org/base/head/stand/libsa/sbrk.c?view=markup#l56>
>>>>> 
>>>>> 
>>>>> Seems that setbase has either not been called or has been called with
>>>>> base=0.
>>>> 
>>>>   Right, which is odd...
>>>> 
>>>>>> [2]
>>>>>> https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688
>>>>   <https://svnweb.freebsd.org/base/head/stand/i386/zfsboot/zfsboot.c?view=markup#l688>
>>>>> 
>>>>> 
>>>>> I had thought, that the zfs boot code has been initialized before the
>>>>> menu is displayed?
>>>> 
>>>>   Right, all of this should be done looooong before we get to the
>>>>   interpreter. Can you break into the loader prompt and try the `heap`
>>>>   command, see what that outputs? CC'ing imp_at_ because he actually knows
>>>>   things.
>>>> 
>>>> Totally weird. I'd add a printf to the sethead() function to display its args
>>>> and see if you get this panic before/after that printf...
>>> 
>>> I'm currently using a Forth-enabled boot loader again, since this is a
>>> "production" machine (my home server, which also receives and keeps all
>>> my work email, for example).
>>> 
>>> I'll build a clean world with the LUA loader and test it on one of the
>>> next days. Tests will include the "heap" loader command and I'll add the
>>> printf (though, if sbrk() has really not been called, I guess that will
>>> not go too well ...).
>>> 
>>> Is it possible, that the setheap function is called a second time, just
>>> before jumping into the kernel? (In that case adding the printf might
>>> crash the loader in the first setheap call ...)
>>> 
>>> Since the loader menu (and escaping from the menu) works, there must be
>>> a valid heap, at that time.
>>> 
>> 
>> indeed. and assuming the message really is from loader, it means, there must
>> be memory corruption - if so, you can check which variables are located
>> close to heap related ones… Also, since you have the working menu, it has to
>> be related to actual loading. Since the loading itself has been working so
>> far, it should be related to lua specific bits which are preparing towards
>> to call load functions.
> 
> Ok, some more data points:
> 
> 1) A printf in setheap reported plausible values during start-up of zfsboot.
>   The menu appeared and wiped away the values so fast that I could not take
>   a photo or write them down.
> 


if you got menu and stuff, it means that at that point the heap was all OK. just after setheap() the bcache_init() is called and that too will allocate memory.

what you can do is to esc out from menu to OK prompt and check the output of heap and biosmem commands… 


> 2) I have rebuilt world and kernel based on r331763. Booting resulted in the
>   same panic as reported before. There was no debug output from the patched
>   setheap call before the panic (which indicates that it was not called a
>   second time).
> 
> 3) In order to get my system to boot, I interrupted loading of zfsloader and
>   forced loading of the previous version (from a world build with Forth in
>   the loader). Booting succeeded with the latest kernel ...
> 
> It looks as if sbrk() was called in zfsloader before setheap() has been used
> to initialize the heap parameters, if lua is enabled instead if Forth. See
> stand/i386/loader/main.c:124 for the location of the setheap call in the
> loader.

this can only happen when something is called before main… 

> 
> This is obviously hard to debug, though, since printf cannot be called at that
> point. A pure write(2) should be possible without heap, but since the console
> has not been initialized at the point of the setheap invocation, there is no
> working output device, AFAIK.
> 
> I do not see, how any sbrk() call could occur before setheap is called. And
> there does not appear to be any other setheap function (or macro) in the
> tree, that could overload the one defined in stand/libsa/sbrk.c ...
> 
> I have no idea how to proceed from here ...
> 
> But now I'm sure it is a problem in zfsloader (or loader in general?).
> 
> Hmmm: How is the panic message printed by sbrk() without a initialized heap?
> The definition of panic in stand/libsa/panic.c relies on a working printf!
> 
> I should be able to use printf in the same way as panic does, but I did
> not succeed when I tried to use it early in zfsloader ...
> 
> Regards, STefan


rgds,
toomas
Received on Fri Mar 30 2018 - 16:10:51 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:15 UTC