On Jul 9, 2009, at 11:01, Thomas Backman wrote: > On Jun 20, 2009, at 09:11, Thomas Backman wrote: > >> I just ran into this tonight. Not sure exactly what triggered it - >> the box stopped responding to pings at 02:07AM and it has a cron >> backup job using zfs send/recv at 02:00, so I'm guessing it's >> related, even though the backup probably should have finished >> before then... Hmm. Anyway. >> >> r194478. >> >> kernel trap 12 with interrupts disabled >> >> Fatal trap 12: page fault while in kernel mode >> cpuid = 0; apic id = 00 >> fault virtual address = 0x288 >> fault code = supervisor read data, page not present >> instruction pointer = 0x20:0xffffffff805a4989 >> stack pointer = 0x28:0xffffff803e8b57e0 >> frame pointer = 0x28:0xffffff803e8b5840 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = resume, IOPL = 0 >> current process = 57514 (zpool) >> panic: from debugger >> cpuid = 0 >> Uptime: 10h22m13s >> Physical memory: 2027 MB >> >> (kgdb) bt >> #0 doadump () at pcpu.h:223 >> #1 0xffffffff8059c409 in boot (howto=260) at /usr/src/sys/kern/ >> kern_shutdown.c:419 >> #2 0xffffffff8059c85c in panic (fmt=Variable "fmt" is not available. >> ) at /usr/src/sys/kern/kern_shutdown.c:575 >> #3 0xffffffff801f1377 in db_panic (addr=Variable "addr" is not >> available. >> ) at /usr/src/sys/ddb/db_command.c:478 >> #4 0xffffffff801f1781 in db_command (last_cmdp=0xffffffff80c38620, >> cmd_table=Variable "cmd_table" is not available. >> ) at /usr/src/sys/ddb/db_command.c:445 >> #5 0xffffffff801f19d0 in db_command_loop () at /usr/src/sys/ddb/ >> db_command.c:498 >> #6 0xffffffff801f3969 in db_trap (type=Variable "type" is not >> available. >> ) at /usr/src/sys/ddb/db_main.c:229 >> #7 0xffffffff805ce465 in kdb_trap (type=12, code=0, >> tf=0xffffff803e8b5730) at /usr/src/sys/kern/subr_kdb.c:534 >> #8 0xffffffff8088715d in trap_fatal (frame=0xffffff803e8b5730, >> eva=Variable "eva" is not available. >> ) at /usr/src/sys/amd64/amd64/trap.c:847 >> #9 0xffffffff80887fb2 in trap (frame=0xffffff803e8b5730) at /usr/ >> src/sys/amd64/amd64/trap.c:345 >> #10 0xffffffff8086e007 in calltrap () at /usr/src/sys/amd64/amd64/ >> exception.S:223 >> #11 0xffffffff805a4989 in _sx_xlock_hard (sx=0xffffff0043557d50, >> tid=18446742975830720512, opts=Variable "opts" is not available. >> ) >> at /usr/src/sys/kern/kern_sx.c:575 >> #12 0xffffffff805a52fe in _sx_xlock (sx=Variable "sx" is not >> available. >> ) at sx.h:155 >> #13 0xffffffff80fe2995 in zfs_freebsd_reclaim () from /boot/kernel/ >> zfs.ko >> #14 0xffffffff808cefca in VOP_RECLAIM_APV (vop=0xffffff0043557d38, >> a=0xffffff0043557d50) at vnode_if.c:1926 >> #15 0xffffffff80626f6e in vgonel (vp=0xffffff00437a7938) at >> vnode_if.h:830 >> #16 0xffffffff8062b528 in vflush (mp=0xffffff0060f2a000, >> rootrefs=0, flags=0, td=0xffffff0061528000) >> at /usr/src/sys/kern/vfs_subr.c:2450 >> #17 0xffffffff80fdd3a8 in zfs_umount () from /boot/kernel/zfs.ko >> #18 0xffffffff8062420a in dounmount (mp=0xffffff0060f2a000, >> flags=1626513408, td=Variable "td" is not available. >> ) >> at /usr/src/sys/kern/vfs_mount.c:1287 >> #19 0xffffffff80624975 in unmount (td=0xffffff0061528000, >> uap=0xffffff803e8b5c00) >> at /usr/src/sys/kern/vfs_mount.c:1172 >> #20 0xffffffff8088783f in syscall (frame=0xffffff803e8b5c90) at / >> usr/src/sys/amd64/amd64/trap.c:984 >> #21 0xffffffff8086e290 in Xfast_syscall () at /usr/src/sys/amd64/ >> amd64/exception.S:364 >> #22 0x000000080104e49c in ?? () >> Previous frame inner to this frame (corrupt stack?) > > > Here's the script in its relevant entirety [...] > > #!/bin/bash > > PATH="$PATH:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/ > sbin" > > function die() { > echo "$_at_" 2>&1 > zpool export slave 2>&1 > /dev/null > ggatel destroy -u 666 2>&1 > /dev/null > exit 1 > } > > function mount_unmount { > if [ -z "$1" ]; then > die 'Invalid argument given to mount_unmount' > elif [[ "$1" == "mount" ]]; then > zpool list | grep -q slave > if [ "$?" = "0" ]; then > echo Already mounted > return 0 > fi > echo Creating ggate device > ggatel create -u 666 /mnt/backup/chaos/slavefile || die 'Unable to > create GEOM provider from file' > echo 'Sleeping for 5 seconds...' > sleep 5 > echo Importing pool > zpool import -R /slave slave || die 'Unable to import slave pool' > elif [[ "$1" == "unmount" ]]; then > echo Exporting pool > zpool export slave || die 'Unable to export slave pool' > ggatel destroy -u 666 || die 'Unable to destroy GEOM provider' > fi > } > > f [ ! -z "$1" ]; then > case $1 in > mount) mount_unmount mount; exit 0;; > unmount) mount_unmount unmount; exit 0;; > initial) initial; exit 0;; > backup) ;; > *) help;; > esac > else > help > fi > > if [ ! -f "/mnt/backup/chaos/slavefile" ]; then > echo 'Backup error! slavefile does not exist!' | mail -s "Backup > error" root > echo 'Slavefile does not exist!' > exit 1 > fi > > mount_unmount mount > > CURR=$(date +"backup-%F-%H%M") > > echo Taking snapshots > zfs snapshot -r tank_at_$CURR || die 'Unable to create $CURR snapshot' > > echo Starting backup... > LAST=$(cat /root/.last-backup) > zfs send -R -I $LAST tank_at_$CURR | zfs recv -Fvd slave > > echo $CURR > /root/.last-backup > > mount_unmount unmount > > echo Running rsync > rsync -av --delete /bootdir/boot exscape::backup-freebsd/chaos > rsync -av --delete /root exscape::backup-freebsd/chaos > rsync -av --delete ~serenity exscape::backup-freebsd/chaos > > echo 'All done!' > > ------------------- Sorry for the monologue, but I ran in to this again, this time with a panic again. Similar but not identical to the old one. OK, so I figured I would update my "untouched" src clone (used to save bandwidth from the FBSD SVN server when I feel I need to start with a *clean* source tree), now that there have been quite a few changes since that revision. I pretty much did this (if other shells are different, !$ in bash is the last argument to the previous command.) 1) Clean up /usr/src from "my" stuff 2) svn update 3) svn diff and svn status, to make sure it was clean 4) zfs promote tank/usr/src ## usr/src was a clone of the untouched, read-only fs "tank/usr/src_r194478-UNTOUCHED" 5) zfs destroy -r tank/usr/src_r194478-UNTOUCHED ## The old one, obviously 6) zfs snapshot tank/usr/src_at_r195488_UNTOUCHED 7) zfs clone !$ tank/usr/src_r195488-UNTOUCHED 8) zfs promote !$ 9) zfs set readonly=on !$ 10) And, in case it may matter, I slightly modified the contents of / usr/src afterwards (applied two patches that aren't merged into HEAD (yet?)). ... I THINK that's it. Since my bash_history got killed in the panic, I may be slighty wrong. In any case, usr/src is now a clone of the readonly UNTOUCHED fs: [root_at_chaos ~]# zfs get origin tank/usr/src NAME PROPERTY VALUE SOURCE tank/usr/src origin tank/usr/ src_r195488_UNTOUCHED_at_r195488_UNTOUCHED - I then ran the backup script just posted in this thread: [root_at_chaos ~]# bash backup.sh backup Creating ggate device Sleeping for 5 seconds... Importing pool Taking snapshots Starting backup... attempting destroy slave/usr/src_r194478-UNTOUCHED_at_backup-20090709-1250 success attempting destroy slave/usr/src_r194478-UNTOUCHED_at_r194478-UNTOUCHED failed - trying rename to slave/usr/src_r194478-UNTOUCHED_at_recv-38883-1 failed (2) - will try again on next pass attempting destroy slave/usr/src_r194478-UNTOUCHED_at_backup-20090709-1235 success attempting destroy slave/usr/src_r194478-UNTOUCHED failed - trying rename to slave/recv-38883-2 failed (2) - will try again on next pass promoting slave/usr/src another pass: attempting destroy slave/usr/src_r194478-UNTOUCHED success attempting destroy slave/usr/src_at_r194478-UNTOUCHED success attempting rename slave/usr/src to slave/usr/src_r195488_UNTOUCHED success receiving incremental stream of tank_at_backup-20090709-1328 into slave_at_backup-20090709-1328 received 312B stream in 1 seconds (312B/sec) receiving incremental stream of tank/tmp_at_backup-20090709-1328 into slave/tmp_at_backup-20090709-1328 received 119KB stream in 1 seconds (119KB/sec) receiving incremental stream of tank/var_at_backup-20090709-1328 into slave/var_at_backup-20090709-1328 received 211KB stream in 1 seconds (211KB/sec) receiving incremental stream of tank/var/log_at_backup-20090709-1328 into slave/var/log_at_backup-20090709-1328 received 468KB stream in 1 seconds (468KB/sec) receiving incremental stream of tank/var/crash_at_backup-20090709-1328 into slave/var/crash_at_backup-20090709-1328 received 312B stream in 1 seconds (312B/sec) receiving incremental stream of tank/root_at_backup-20090709-1328 into slave/root_at_backup-20090709-1328 received 156KB stream in 1 seconds (156KB/sec) receiving incremental stream of tank/usr_at_backup-20090709-1328 into slave/usr_at_backup-20090709-1328 received 302KB stream in 1 seconds (302KB/sec) receiving incremental stream of tank/usr/obj_at_backup-20090709-1328 into slave/usr/obj_at_backup-20090709-1328 received 8.52MB stream in 8 seconds (1.07MB/sec) receiving incremental stream of tank/usr/ src_r195488_UNTOUCHED_at_r195488_UNTOUCHED into slave/usr/ src_r195488_UNTOUCHED_at_r195488_UNTOUCHED received 112MB stream in 43 seconds (2.60MB/sec) receiving incremental stream of tank/usr/ src_r195488_UNTOUCHED_at_backup-20090709-1328 into slave/usr/ src_r195488_UNTOUCHED_at_backup-20090709-1328 received 312B stream in 1 seconds (312B/sec) receiving incremental stream of tank/usr/ports_at_backup-20090709-1328 into slave/usr/ports_at_backup-20090709-1328 received 312B stream in 1 seconds (312B/sec) receiving incremental stream of tank/usr/ports/ distfiles_at_backup-20090709-1328 into slave/usr/ports/ distfiles_at_backup-20090709-1328 received 312B stream in 1 seconds (312B/sec) found clone origin slave/usr/src_r195488_UNTOUCHED_at_r195488_UNTOUCHED receiving incremental stream of tank/usr/src_at_backup-20090709-1328 into slave/usr/src_at_backup-20090709-1328 received 216KB stream in 1 seconds (216KB/sec) local fs slave/usr/src does not have fromsnap (backup-20090709-1250 in stream); must have been deleted locally; ignoring Exporting pool Read from remote host 192.168.1.10: Operation timed out ... and the DDB output (first part copy/paste from kgdb, second part handwritten since the kgdb BT was totally broken, as I expected.) Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0x70 fault code = supervisor write data, page not present instruction pointer = 0x20:0xffffffff8036e855 stack pointer = 0x28:0xffffff803ea637d0 frame pointer = 0x28:0xffffff803ea637e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 38905 (zpool) _sx_slock() dmu_buf_update_user()+0x47 zfs_znode_dmu_fini() zfs_freebsd_reclaim() VOP_RECLAIM_APV() vgonel() vflush() zfs_umount() dounmount() unmount() syscall() Xfast_syscall() BTW, what's with the "local fs slave/usr/src does not have fromsnap (backup-20090709-1250 in stream); must have been deleted locally; ignoring" part? This is what I get when I try an incremental backup now: [root_at_chaos ~]# bash backup.sh backup Creating ggate device Sleeping for 5 seconds... Importing pool Taking snapshots Starting backup... local fs slave/usr/src does not have fromsnap (backup-20090709-1250 in stream); must have been deleted locally; ignoring receiving incremental stream of tank_at_backup-20090709-1328 into slave_at_backup-20090709-1328 snap slave_at_backup-20090709-1328 already exists; ignoring local fs slave/usr/src does not have fromsnap (backup-20090709-1250 in stream); must have been deleted locally; ignoring warning: cannot send 'tank/tmp_at_backup-20090709-1328': Broken pipe warning: cannot send 'tank/tmp_at_backup-20090709-1406': Broken pipe Exporting pool Running rsync ... rsync runs, no panic, but no ZFS backup, either. Guess it's time for another "initial" backup, i.e. start all over with 0 snapshots... The initial backup worked just fine, it found the clone/origin etc. and made it work. Stripped from comments and echo statements, the function is simply: function initial { for BACK in $(zfs list -t snapshot -H -r tank | awk '{print $1}'); do zfs destroy $BACK; done zpool destroy slave 2>/dev/null; sleep 3; ggatel destroy -u 666 2>/dev/null; sleep 3 # Clean up if needed ggatel create -u 666 /mnt/backup/chaos/slavefile; sleep 3 zpool create -f -R /slave slave ggate666 && NOW=$(date +"backup-%Y %m%d-%H%M") || die 'Unable to create pool' zfs snapshot -r tank_at_$NOW || die 'Unable to snapshot' zfs send -R tank_at_$NOW | zfs recv -vFd slave mount_unmount unmount echo $NOW > /root/.last-backup } After that, incrementals are fine again. Regards, ThomasReceived on Thu Jul 09 2009 - 10:36:28 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:51 UTC