On Wed, Jan 21, 2004 at 10:27:30AM -0800, Kris Kennaway wrote: > On Wed, Jan 21, 2004 at 12:28:27PM -0500, Robin P. Blanchard wrote: > > I have one -CURRENT client: > > CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.25-MHz 686-class CPU) > > Origin = "GenuineIntel" Id = 0xf27 Stepping = 7 > > > > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA, > > CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR, > > SSE,SSE2,SS,HTT,TM,PBE> > > Hyperthreading: 2 logical CPUs > > real memory = 1073610752 (1023 MB) > > avail memory = 1045266432 (996 MB) > > ACPI APIC Table: <DELL PE2650 > > > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > > > > which, when installing a new world (via nfs), consistently hangs at the end > > with: > > > > -------------------------------------------------------------- > > >>> Rebuilding man page indices > > -------------------------------------------------------------- > > cd /usr/src/share/man; make makedb > > makewhatis /usr/share/man > > > > > > The box is useable at this point, however. I have been simply rebooting the > > machine, and then running the above commands by hand after the reboot. While > > 'installworld' is hung (at the end, as above), this is in a 'top': > > > > 19107 root -4 0 992K 896K getblk 1 0:01 0.00% 0.00% > > makewhatis > > How long has it been "hung" for? If you have a slow network you might > be killing it while it is doing work. > > Do you have rpc.lockd and statd running on both client and server? I have the same machine (Dell 2650) and it's getting locked up in a very similar way, you don't need to get NFS involved to have processes get locked uup in getblk. I'm slowly trying to remove variables but so far it seems like network activity of some sort helps cause the lockup. The easiest way to make it lock up was doing backups through the network. But find's cranked up by the nightly cron jobs can get locked in getblk as well (while there are no NFS partitions mounted, but things like cvsup updates of a local repo are happening). Once things start to get locked up like this the system slowly degrades. I can usually ssh in and reboot it if I catch it soon enough, if I leave it for a couple of days it will seem like it's up (rwhod is running) but ssh-ing in won't work. sledge (amd64 machine in the cluster) was showing similar symptoms this morning, it had failed doing its nightly rebuild/reboot and things like mtree commands were wedged since a day or two ago. The Dell I have here is not really in production at all, if me doing anything here will help I'm game... -- Ken Smith - From there to here, from here to | kensmith_at_cse.buffalo.edu there, funny things are everywhere. | - Theodore Geisel |Received on Wed Jan 21 2004 - 09:40:17 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:39 UTC