Re: ldd leaves the machine unresponsive

From: jhell <jhell_at_DataIX.net>
Date: Sat, 20 Mar 2010 07:27:43 -0400
On Fri, 19 Mar 2010 17:15, Anton Shterenlikht wrote:
In Message-Id: <20100319211535.GA76683_at_mech-cluster241.men.bris.ac.uk>

> On Thu, Mar 18, 2010 at 11:29:36AM -0400, jhell wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> On Wed, 17 Mar 2010 12:32, Anton Shterenlikht wrote:
>> In Message-Id: <20100317163230.GJ87732_at_mech-cluster241.men.bris.ac.uk>
>>
>>> Just updated to ia64 r205248
>>>
>>> If my problem is due to my mis-configuration,
>>> I apologise in advance.
>>>
>>> I run this shell script after each upgrade
>>> and 'make delete-old-libs' to check
>>> if any shared objects need to be rebuilt:
>>>
>>> <start script>
>>>
>>> #!/bin/sh
>>>
>>> for file in `find /bin /sbin /usr/bin /usr/sbin /usr/lib /usr/libexec /usr/local -name "*"`
>>> do
>>>        echo $file
>>>        ldd $file >> /root/ldd_results 2> /dev/zero
>>> done
>>>
>>> <end script>
>>>
>>
>> This will probably do closer to what you actually would want to look for.
>>
>> Writing to /dev/zero ... I don't know never tried it since /dev/null is
>> usually the standard place to throw trash.
>>
>> #!/bin/sh
>> for file in `find /*bin /usr/*bin /usr/lib* /usr/local/*bin -type f` do
>>  	echo $file
>>  	ldd $file >>/root/ldd_results 2>/dev/null
>> done
>>
>> The problem with your script is that it finds most files that it can not
>> or is not useful to run ldd on and leaves you junk in return.
>>
>> It might be more useful if you searched for dynamically linked ELF
>> binaries to run ldd against like the following.
>>
>> === Script starts here ===
>> #!/bin/sh
>>
>> SEARCHPATH="/*bin /usr/*bin /usr/lib* /usr/local/*bin"
>>
>> trap 'exit 1' 2
>>
>> check_libs() {
>> for spath in $SEARCHPATH; do
>>          for ifelf in `find $spath -type f`; do
>>                  ldd `file $ifelf | grep dynamically | cut -f1 -d:`
>>          done
>> done
>> }
>>
>> check_libs 2>/dev/null
>> === Script ends here ===
>>
>> The above will find all type ELF * that are dynamically linked within the
>> SEARCHPATH variable and run ldd on them and print the results to stdout.
>>
>> Obviously since you are going to have thousands of files being questioned,
>> stdout is not going to be useful.
>>
>> So with the about stated:
>> save the script to: checklibs.sh
>> run with: "sh checklibs.sh >/root/checklibs_output"
>> or: "script /root/checklibs_output checklibs.sh"
>>
>>> After the upgrade to r205248, the script
>>> freezes at seemingly random points.
>>>
>>
>> Unneeded disk usage & execution.
>>
>>> I can still ssh to the machine (using keys), i.e.
>>> I see the welcome message, but cannot get to the console prompt.
>>
>> Of course... to many open files or processes in wait. SSH already has the
>> information it needs loaded into memory, that's why you can get sort-of-in
>>
>> ZFS file-system perhaps ?
>
> I've no ZFS.
>
> I'm seeing very similar behaviour now with csup:
>
> ( I do csup -L2 /root/ports-supfile, where
>
> # cat /root/ports-supfile
> *default host=cvsup.uk.FreeBSD.org
> *default base=/var/db
> *default prefix=/usr
> *default release=cvs tag=. delete use-rel-suffix compress
>
> ports-all
> # )
>
> top(1) shows:
>
> last pid:  1160;  load averages:  0.00,  0.06,  0.07                                                                           up 0+00:10:53  15:05:52
> 81 processes:  3 running, 61 sleeping, 17 waiting
> CPU 0:  0.0% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
> CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 23M Active, 19M Inact, 75M Wired, 136K Cache, 34M Buf, 5900M Free
> Swap: 2780M Total, 2780M Free
>
>  PID    UID    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>   10      0      2 171 ki31     0K    64K RUN     0  20:18 198.00% idle
>   11      0     17 -48    -     0K   544K WAIT    0   0:01  0.00% intr
> 1118   1001      1  96    0 12800K  3920K CPU0    0   0:00  0.00% top
>    4      0      1  -8    -     0K    32K -       1   0:00  0.00% g_down
> 1158      0      4  -8    0 43672K  6296K biowr   0   0:00  0.00% csup
>
>
> which stays in biowr state indefinitely.
>
> I can issue kill -9 or kill -HUP from top(1),
> which makes csup change state to STOP, but
> nothing else happens.
>
> As before, I can't log in from other terminals
> and have to do a cold reset. I've reinstalled
> on another disk, so not sure what's going on.
>
> I think rm(1) is also extremely slow, but
> maybe I'm imagining things.
>
> many thanks
> anton
>
>


I would post up the contents of your make.conf & your kernel config & your 
dmesg somewhere so it can be evaluated.

Regards,

-- 

  jhell
Received on Sat Mar 20 2010 - 10:28:30 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:01 UTC