Re: patches to fix "ps -M" as used in crashinfo(8)

From: Bruce Cran <bruce_at_cran.org.uk>
Date: Wed, 26 Aug 2009 22:35:31 +0100
On 26/08/2009 13:44, John Baldwin wrote:
> On Monday 24 August 2009 6:01:45 pm Bruce Cran wrote:
>    
>> I've recently been debugging a series of problems with running ps(1)
>> on crash dumps, and now have a couple of patches: the bugs cause
>> ps(1) to crash while crashinfo(8) is being run during boot, dumping a
>> 1GB ps.core file in the root filesystem.
>>
>> The patches are at
>> http://www.cran.org.uk/~brucec/freebsd/pr137890.kvm_proc.c.diff and
>> http://www.cran.org.uk/~brucec/freebsd/pr137890.ps.c.diff
>>
>> The problem with ps.c is that like pkill(1) and w(1), they all
>> initialize the execfile argument to kvm_open or kvm_openfiles to
>> "/dev/null" instead of NULL, causing the default usage of "ps
>> -M /var/crash/vmcore.x" to fail because libkvm fails to
>> fstat /dev/null. They only work if "-N" is also specified.
>>      
> Note that crashinfo specifies both -M and -N:
>
> echo
> "------------------------------------------------------------------------"
> echo "ps -axl" echo
> ps -M $VMCORE -N $KERNEL -axl
> echo
>    

I realised that just after posting, when I checked how it runs ps.
When I saw the segfault at bootup I think I just ran
"-ax -M /var/crash/vmcore.x" and saw it segfault too, so jumped to the
wrong conclusion. In the end there were a
couple of ways to get it to crash, and I'm not convinced I've found
them all yet.

> I'm not sure that 'ps -M blah' without '-N' should really work.
> Also, I'm not sure how fstat() of /dev/null could fail?
>    

The documentation (for ps and the equivalent parameter for kvm_open) 
seems to say that if you don't specify "-N" then
the currently running kernel is used, as specified by getbootfile(3).
I don't know if that makes sense or not.

The code which involved fstat was in __aout_fdnlist in
lib/libc/gen/nlist.c:

/* check that file is at least as large as struct exec! */
if ((_fstat(fd, &st) < 0) || (st.st_size < sizeof(struct exec)))
     return (-1);

I guess it was the second check that was failing and causing the 
function to return, and not the fstat call.

> The kvm_nlist() bug in libkvm should probably still be fixed, and the
> ngroups one you might want to poke brooks_at_ about.
>
>    

-- 
Bruce Cran
Received on Wed Aug 26 2009 - 19:35:40 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:54 UTC