Hello, I see the same problem here, I have posted a backtrace with debug symbols in PR bin/61718. But it seldomly crashes with segfault, more often it just stops responding and starts to eat 30-50% cpu, so I have to kill it and restart. (I have a small perl script that does this for me). It may be that it would crash if I let it run some more, but it causes severe havoc to my network, so I have to restart it as soon as possible to make things work again. The server hosts home directories for about 400k users. Client servers are Linux mail (POP/IMAP) front ends accessing ~/mbox or IMAP folders, web servers for hosting homepages etc. I'm having a hard time reproducing the error, but I have set up a crash-box and I'm trying as hard as I can to make it crash on demand, no luck yet :-/ When the problem first occurs, I often have to restart it several times before things start working again, this is probably because the client retries whatever action that made the server crash in the first place. Mvh, Frode On Jan 28, 2004, at 11:17, Rory Arms wrote: > -current developers, > > I've noticed, since upgrading from 5.1-RELEASE-p11 to 5.2-RELEASE a > few weeks ago, rpc.lockd has been crashing repeatedly, though > randomly. All the clients are MacOS X 10.3 machines and one 10.2. > Though, I think it wasn't till 10.3, that client NFS locking was > finally supported. It looks like they request locks for certain > operations, such as when reading the address book file. It definitely > had something to do with the new version, as it started occurring > after the upgrade, with no other changes on the network. > > Here are the machine's specs: > > > cat /var/run/dmesg.boot > Copyright (c) 1992-2004 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, > 1994 > The Regents of the University of California. All rights > reserved. > FreeBSD 5.2-RELEASE #3: Mon Jan 12 13:56:46 EST 2004 > Preloaded elf kernel "/boot/kernel/kernel" at 0xc083b000. > Preloaded elf module "/boot/kernel/acpi.ko" at 0xc083b244. > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Pentium II/Pentium II Xeon/Celeron (375.04-MHz 686-class CPU) > Origin = "GenuineIntel" Id = 0x652 Stepping = 2 > > Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE > ,MCA,CMOV,PAT,PSE36,MMX,FXSR> > real memory = 536739840 (511 MB) > avail memory = 511721472 (488 MB) > ACPI APIC Table: <TYANCP TYANTBLE> > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > ioapic0 <Version 1.1> irqs 0-23 on motherboard > Pentium Pro MTRR support enabled > npx0: [FAST] > npx0: <math processor> on motherboard > npx0: INT 16 interface > acpi0: <TYANCP TYANTBLE> on motherboard > acpi0: Overriding SCI Interrupt from IRQ 9 to IRQ 20 > pcibios: BIOS version 2.10 > acpi0: Power Button (fixed) > Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 > acpi_cpu0: <CPU> on acpi0 > acpi_cpu1: <CPU> on acpi0 > acpi_cpu1: Failed to attach throttling P_CNT > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pcib1: <PCI-PCI bridge> at device 1.0 on pci0 > pci1: <PCI bus> on pcib1 > isab0: <PCI-ISA bridge> at device 7.0 on pci0 > isa0: <ISA bus> on isab0 > atapci0: <Intel PIIX4 UDMA33 controller> port 0xffa0-0xffaf at device > 7.1 on pci0 > ata0: at 0x1f0 irq 14 on atapci0 > ata0: [MPSAFE] > ata1: at 0x170 irq 15 on atapci0 > ata1: [MPSAFE] > pci0: <serial bus, USB> at device 7.2 (no driver attached) > pci0: <bridge, PCI-unknown> at device 7.3 (no driver attached) > pcib2: <PCI-PCI bridge> at device 16.0 on pci0 > pci2: <PCI bus> on pcib2 > pcib2: slot 5 INTA is routed to irq 17 > fxp0: <Intel 82559 Pro/100 Ethernet> port 0xdf00-0xdf3f mem > 0xfd500000-0xfd5fffff,0xfd6ff000-0xfd6fffff irq 17 at device 5.0 on > pci2 > fxp0: Ethernet address 00:90:27:ee:02:97 > miibus0: <MII bus> on fxp0 > inphy0: <i82555 10/100 media interface> on miibus0 > inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > fxp1: <Intel 82558 Pro/100 Ethernet> port 0xef40-0xef5f mem > 0xfea00000-0xfeafffff,0xffaff000-0xffafffff irq 19 at device 17.0 on > pci0 > fxp1: Ethernet address 00:e0:81:10:22:27 > miibus1: <MII bus> on fxp1 > inphy1: <i82555 10/100 media interface> on miibus1 > inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto > ahc0: <Adaptec aic7895 Ultra SCSI adapter> port 0xe400-0xe4ff mem > 0xfebee000-0xfebeefff irq 16 at device 18.0 on pci0 > aic7895C: Ultra Wide Channel A, SCSI Id=7, 32/253 SCBs > ahc1: <Adaptec aic7895 Ultra SCSI adapter> port 0xe800-0xe8ff mem > 0xfebef000-0xfebeffff irq 16 at device 18.1 on pci0 > aic7895C: Ultra Wide Channel B, SCSI Id=7, 32/253 SCBs > ahc2: <Adaptec 2902/04/10/15/20C/30C SCSI adapter> port 0xe000-0xe0ff > mem 0xfebed000-0xfebedfff irq 16 at device 19.0 on pci0 > aic7850: Single Channel A, SCSI Id=7, 3/253 SCBs > pci0: <display, VGA> at device 20.0 (no driver attached) > acpi_button0: <Sleep Button> on acpi0 > atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 > atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 > kbd0 at atkbd0 > fdc0: cmd 3 failed at out byte 1 of 3 > sio0 port 0x3f8-0x3ff irq 4 on acpi0 > sio0: type 16550A > sio1 port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > fdc0: cmd 3 failed at out byte 1 of 3 > orm0: <Option ROMs> at iomem 0xc8000-0xcc7ff,0xc0000-0xc7fff on isa0 > pmtimer0 on isa0 > fdc0: <Enhanced floppy controller (i82077, NE72065 or clone)> at port > 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on isa0 > fdc0: FIFO enabled, 8 bytes threshold > fd0: <1440-KB 3.5" drive> on fdc0 drive 0 > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on > isa0 > Timecounters tick every 10.000 msec > ipfw2 initialized, divert enabled, rule-based forwarding enabled, > default to accept, logging limited to 50 packets/entry by default > IPsec: Initialized Security Association Processing. > acpi_cpu: throttling enabled, 8 steps (100% to 12.5%), currently 100.0% > GEOM: create disk ad0 dp=0xc4835760 > ad0: 57259MB <MAXTOR 6L060J3> [116336/16/63] at ata0-master UDMA33 > Waiting 15 seconds for SCSI devices to settle > GEOM: create disk da0 dp=0xc48dd050 > GEOM: create disk da1 dp=0xc48cbc50 > da0 at ahc0 bus 0 target 2 lun 0 > da0: <SEAGATE ST410800N 7117> Fixed Direct Access SCSI-2 device > da0: 10.000MB/s transfers (10.000MHz, offset 15) > da0: 8347MB (17096357 512 byte sectors: 255H 63S/T 1064C) > da1 at ahc0 bus 0 target 6 lun 0 > da1: <QUANTUM FIREBALL SE8.4S PJ0A> Fixed Direct Access SCSI-2 device > da1: 20.000MB/s transfers (20.000MHz, offset 15), Tagged Queueing > Enabled > da1: 8191MB (16777215 512 byte sectors: 255H 63S/T 1044C) > SMP: AP CPU #1 Launched! > Mounting root from ufs:/dev/da0s1a > > So, it is a dual Pentium II system, using the Tyan Thunder 100 > motherboard. > > I can provide the kern conf file, if needed. It is using the default > SCHED_4BSD scheduler. > > Anyhow, the client stalls when this happens. I have to run > "/etc/rc.d/nfslocking restart" on the server to get it going again. > Here's the log entry I see when I crashes: > > Jan 16 01:11:02 Tserver kernel: pid 424 (rpc.lockd), uid 0: exited on > signal 11 > (core dumped) > > I've found the core file it leaves behind. Here's some probing with > gdb. I'm not really proficient with gdb, so if more is needed let me > know what kind of extra information you need. > > > sudo gdb -c /rpc.lockd.core /usr/sbin/rpc.lockd > GNU gdb 5.2.1 (FreeBSD) > Copyright 2002 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and > you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for > details. > This GDB was configured as "i386-unknown-freebsd"... > (no debugging symbols found)... > Core was generated by `rpc.lockd'. > Program terminated with signal 11, Segmentation fault. > Reading symbols from /usr/lib/librpcsvc.so.2...(no debugging symbols > found)... > done. > Loaded symbols for /usr/lib/librpcsvc.so.2 > Reading symbols from /lib/libutil.so.4...(no debugging symbols > found)...done. > Loaded symbols for /lib/libutil.so.4 > Reading symbols from /lib/libc.so.5...(no debugging symbols > found)...done. > Loaded symbols for /lib/libc.so.5 > Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols > found)... > done. > Loaded symbols for /libexec/ld-elf.so.1 > #0 0x0804dd2f in sigprocmask () > (gdb) bt > #0 0x0804dd2f in sigprocmask () > #1 0x080507f6 in _fini () > #2 0x0804e39c in sigprocmask () > #3 0x0804ec40 in sigprocmask () > #4 0x0804f1a0 in sigprocmask () > #5 0x0804f52e in sigprocmask () > #6 0x0804cd73 in sigprocmask () > #7 0x0804aec4 in sigprocmask () > #8 0x280fe838 in svc_getreq_common () from /lib/libc.so.5 > #9 0x280fe61f in svc_getreqset () from /lib/libc.so.5 > #10 0x280bde94 in svc_run () from /lib/libc.so.5 > #11 0x0804b378 in sigprocmask () > #12 0x080498a2 in sigprocmask () > (gdb) > > -rory > > > > -- > Name: Rory Arms | TZ: GMT-5 | Web: http://www.TrueStep.com/~rory/ > Email: rorya_at_TrueStep.com | Format: RFC-822 compliant > Finger: rorya_at_TrueStep.com for info. | Telephone: +1 859-225-3833 > "The mind's the standard of a man" --Joseph Merrick > > > > -- > Name: Rory Arms Email: rorya_at_TrueStep.com > Tel: +1 859-225-3833 Time Zone: GMT -5 > I went fishing with a dotted line...I caught every other fish. -Steven > Wright > > _______________________________________________ > freebsd-current_at_freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe_at_freebsd.org" -- drift | frode nordahl powertech information systems nedre slottsgate 5 0157 oslo tlf | + 47 23 01 00 00 fax | + 47 23 01 00 01 dir | + 47 23 01 00 45 email | frode_at_powertech.no web | www.powertech.noReceived on Thu Jan 29 2004 - 00:51:35 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:40 UTC