On Thu, 28 May 2009, Larry Rosenman wrote: > On Thu, 28 May 2009, Kip Macy wrote: > >> On Tue, May 26, 2009 at 5:04 AM, Larry Rosenman <ler_at_lerctr.org> wrote: >>> On Mon, 25 May 2009, Larry Rosenman wrote: >>> >>>> On Mon, 25 May 2009, Larry Rosenman wrote: >>>> >>>>> after looking at the code, never mind the "don't call doadump", so we'll >>>>> get the textdump. >>>>> >>>>> Thanks rwatson for the textdump stuff! >>>>> >>>> Here is current stats before we crash. Does any of this look totally >>>> out of line? >>>> >>> It crashed again, but did *NOT* make it into ddb enough to do the >>> textdump. >>> >>> It was hung with the backtrace (looks like the same, but I couldn't >>> scroll the screen back). >>> >>> Ideas? >>> >>> I'm really concerned that there is a problem. >>> >>> >>> >> >> >> - Type of disks? > 6 SATA Seagate 400GB (5) / 500 GB (1). > > > ATA channel 0: > Master: acd0 <Memorex DVD+-RAM 510L v1/MWS7> ATA/ATAPI revision 7 > Slave: no device present > ATA channel 2: > Master: ad4 <ST3400620AS/3.AAJ> SATA revision 2.x > Slave: no device present > ATA channel 3: > Master: ad6 <ST3400620AS/3.AAJ> SATA revision 2.x > Slave: no device present > ATA channel 4: > Master: ad8 <ST3500630AS/3.AAE> SATA revision 2.x > Slave: no device present > ATA channel 5: > Master: ad10 <ST3400620AS/3.AAJ> SATA revision 2.x > Slave: no device present > ATA channel 6: > Master: ad12 <ST3400620AS/3.AAJ> SATA revision 2.x > Slave: no device present > ATA channel 7: > Master: ad14 <ST3400620AS/3.AAJ> SATA revision 2.x > Slave: no device present >> >> >> - Size of zpools? > All 6. > > pool: vault > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vault ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > ad12 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 > ad4s1f ONLINE 0 0 0 > ad4s1e ONLINE 0 0 0 > ad4s1d ONLINE 0 0 0 > > errors: 10 data errors, use '-v' for a list > > > pool: vault > state: ONLINE > status: One or more devices has experienced an error resulting in data > corruption. Applications may be affected. > action: Restore the file in question if possible. Otherwise restore the > entire pool from backup. > see: http://www.sun.com/msg/ZFS-8000-8A > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vault ONLINE 0 0 0 > raidz1 ONLINE 0 0 0 > ad6 ONLINE 0 0 0 > ad8 ONLINE 0 0 0 > ad10 ONLINE 0 0 0 > ad12 ONLINE 0 0 0 > ad14 ONLINE 0 0 0 > ad4s1f ONLINE 0 0 0 > ad4s1e ONLINE 0 0 0 > ad4s1d ONLINE 0 0 0 > > errors: Permanent errors have been detected in the following files: > > /usr/local/sbin/p4d > /var/db/bacula/borg-dir.conmsg > vault/usr/obj:<0x16c3a> > vault/usr/obj:<0x169bb> > /usr/obj/usr/src/lib/libc/random.o > >> >> >> - Compression enabled? > Yes. > > > Ok, it just crashed. Unfortunately, I'm at work and the box is at home. I did have my script running every minute of that entire boot. What I saw was a full backup running, and then we started paging, and then the backup jobs got pager errors, and were killed. I'm not sure what else went on, so I restarted the bacula daemons that got killed, and was in the bacula console when it died. I'll see if I can get a cell-phone camera shot of the console. I'll also tar up the vmstat outputs and put them on my web server. What other forensics should I get? Bear in mind the system is probably locked up with no dump taken :( -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 512-248-2683 E-Mail: ler_at_lerctr.org US Mail: 430 Valona Loop, Round Rock, TX 78681-3893Received on Fri May 29 2009 - 15:44:46 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:48 UTC