This is the error I got when I run the failover script. Sep 24 06:43:39 san1 hastd[3404]: [disk3] (primary) Provider /dev/mfid3 is not part of resource disk3. Sep 24 06:43:39 san1 hastd[3343]: [disk3] (primary) Worker process exited ungracefully (pid=3404, exitcode=66). Sep 24 06:43:39 san1 hastd[3413]: [disk6] (primary) Provider /dev/mfid6 is not part of resource disk6. Sep 24 06:43:39 san1 hastd[3343]: [disk6] (primary) Worker process exited ungracefully (pid=3413, exitcode=66). Sep 24 06:43:39 san1 hastd[3425]: [disk10] (primary) Unable to open /dev/mfid10: No such file or directory. Sep 24 06:43:39 san1 hastd[3407]: [disk4] (primary) Provider /dev/mfid4 is not part of resource disk4. Sep 24 06:43:39 san1 hastd[3343]: [disk10] (primary) Worker process exited ungracefully (pid=3425, exitcode=66). Sep 24 06:43:39 san1 hastd[3410]: [disk5] (primary) Provider /dev/mfid5 is not part of resource disk5. Sep 24 06:43:39 san1 hastd[3343]: [disk4] (primary) Worker process exited ungracefully (pid=3407, exitcode=66). Sep 24 06:43:39 san1 hastd[3416]: [disk7] (primary) Provider /dev/mfid7 is not part of resource disk7. Sep 24 06:43:39 san1 hastd[3422]: [disk9] (primary) Provider /dev/mfid9 is not part of resource disk9. Sep 24 06:43:39 san1 hastd[3419]: [disk8] (primary) Provider /dev/mfid8 is not part of resource disk8. Sep 24 06:43:39 san1 hastd[3343]: [disk5] (primary) Worker process exited ungracefully (pid=3410, exitcode=66). Sep 24 06:43:40 san1 hastd[3343]: [disk9] (primary) Worker process exited ungracefully (pid=3422, exitcode=66). Sep 24 06:43:40 san1 hastd[3343]: [disk8] (primary) Worker process exited ungracefully (pid=3419, exitcode=66). Sep 24 06:43:40 san1 hastd[3343]: [disk7] (primary) Worker process exited ungracefully (pid=3416, exitcode=66). Sep 24 06:43:40 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803). Sep 24 06:43:45 san1 hastd[3348]: [disk1] (primary) Split-brain condition! Sep 24 06:43:50 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803). Sep 24 06:43:55 san1 hastd[3348]: [disk1] (primary) Split-brain condition! Sep 24 06:44:00 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803). Sep 24 06:44:05 san1 hastd[3348]: [disk1] (primary) Split-brain condition! Sep 24 06:44:10 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803) Is there any patch I need to run to fix this issue? From: Jose A. Lombera [mailto:jose_at_lajni.com] Sent: Sunday, September 23, 2012 10:00 PM To: freebsd-current_at_freebsd.org Cc: freebsd-current_at_freebsd.org Subject: RE: zpool can't bring online disk2 ----I screwed up Everytime I run this for any of the disk 3,4,5,6,7,8,9,10 Disk 1,2 shows in the /dev/hast [root_at_san2 /usr/home/jose]# hastctl role primary disk3 [root_at_san2 /usr/home/jose]# I got this in the logs. Sep 23 21:58:13 san2 hastd[2793]: [disk3] (primary) Provider /dev/mfid3 is not part of resource disk3. Please help. Thanks. From: Jose A. Lombera [mailto:jose_at_lajni.com] Sent: Sunday, September 23, 2012 9:46 PM To: 'Freddie Cash' Cc: freebsd-current_at_freebsd.org Subject: RE: zpool can't bring online disk2 ----I screwed up Please, some one help me….!!! I screw up big time. I was doing the Hastctl create disk2 But since I got some input out errors I decided to stop /etc/rc.d/hastd stop But since couldn’t stop disk1 and 9 I killed it. Restarted both servers. And now only /dev/hast shows nothing. And the pool is lost. I was able to create disk2. I have restarted both server but the pool is not coming up. Any suggestions, please help I know that the info is there since I only did “hastctl create disk2” I haven’t done it for the other disks. From: Jose A. Lombera [mailto:jose_at_lajni.com] Sent: Sunday, September 23, 2012 8:10 PM To: 'Freddie Cash' Cc: freebsd-current_at_freebsd.org Subject: RE: zpool can't bring online disk2 Freddie, Thanks for your great help, now makes so much sense. I still have a small problem, and I'm not sure if it is because hastd is running. I can't initialize (hastctl create disk2) disk2 This is what I did. 1.. zpool offline tank /dev/dsk/hast/disk2 2. zpool status -x [root_at_san /usr/home/jose]# zpool status -x pool: tank state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 2012 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 hast/disk1 ONLINE 0 0 0 11919832608590631234 OFFLINE 0 0 0 was /dev/dsk/hast/disk2 hast/disk3 ONLINE 0 0 0 hast/disk4 ONLINE 0 0 0 hast/disk5 ONLINE 0 0 0 hast/disk6 ONLINE 0 0 0 hast/disk7 ONLINE 0 0 0 hast/disk8 ONLINE 0 0 0 hast/disk9 ONLINE 0 0 0 hast/disk10 ONLINE 0 0 0 errors: No known data errors 3. removed disk / insert a new one. 4. initialize Hastctl role init disk2 [root_at_san /usr/home/jose]# hastctl status disk2 disk2: role: init provname: disk2 localpath: /dev/mfid2 extentsize: 0 (0B) keepdirty: 0 remoteaddr: san1 replication: fullsync dirty: 0 (0B) statistics: reads: 0 writes: 0 deletes: 0 flushes: 0 activemap updates: 0 [root_at_san /usr/home/jose]# [root_at_san /usr/home/jose]# [root_at_san /usr/home/jose]# hastctl create disk2 [ERROR] [disk2] Unable to write metadata: Input/output error. I don't want to stop hastd since it will shut down the connection to my san. Do you have any suggestion? Thanks --jose -----Original Message----- From: owner-freebsd-current_at_freebsd.org [mailto:owner-freebsd-current_at_freebsd.org] On Behalf Of Freddie Cash Sent: Sunday, September 23, 2012 6:30 PM To: compufutura -the computer of the future Cc: yanegomi_at_gmail.com; freebsd-current_at_freebsd.org Subject: RE: zpool can't bring online disk2 Since it's a HAST device, you have to initialise the disk via hastctl. Once that is done, the /dev/hast/disk2 GEOM device node will be created. Then you can 'zpool replace' it. One step at a time. :) And you've skipped a few. 1. 'zpool offline' the defective disk 2. Physically remove the defective disk 3. Physically insert the new disk 4. Initialise it as a HAST resource via 'hastctl' 5. 'zpool replace' it using the /dev/hast node 6. Wait for the pool (and HAST) to resilver it 7. Carry on as per normal On Sep 23, 2012 2:28 PM, "compufutura -the computer of the future" < <mailto:jose_at_compufutura.com> jose_at_compufutura.com> wrote: > Yanegomi, > > > > I tried that, as you can see below, freebsd doesn’t have cfgadm > > Utility to un configure the device, according to, > <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html, I > looked to ports but there is no utility like that. > > > > Pardon me, my knowledge is little. > > > > Can you please type the command I will need, or if I need cfgadm do I > have to look for that and install it in my freebsd box? > > > > Thanks. > > > > > > [root_at_san1 /usr/home/jose]# zpool offline tank hast/disk2 > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# zpool status -x > > pool: tank > > state: DEGRADED > > status: One or more devices has been taken offline by the administrator. > > Sufficient replicas exist for the pool to continue functioning > in a > > degraded state. > > action: Online the device using 'zpool online' or replace the device > with > > 'zpool replace'. > > scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 > 2012 > > config: > > > > NAME STATE READ WRITE CKSUM > > tank DEGRADED 0 0 0 > > raidz1-0 DEGRADED 0 0 0 > > hast/disk1 ONLINE 0 0 0 > > 11919832608590631234 OFFLINE 0 0 0 was > /dev/hast/disk2 > > hast/disk3 ONLINE 0 0 0 > > hast/disk4 ONLINE 0 0 0 > > hast/disk5 ONLINE 0 0 0 > > hast/disk6 ONLINE 0 0 0 > > hast/disk7 ONLINE 0 0 0 > > hast/disk8 ONLINE 0 0 0 > > hast/disk9 ONLINE 0 0 0 > > hast/disk10 ONLINE 0 0 0 > > > > errors: No known data errors > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# zpool replace tank hast/disk2 > > cannot open 'hast/disk2': no such GEOM provider > > must be a full path or shorthand device name > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# cfgadm > > bash: cfgadm: command not found > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# zpool offline tank hast/disk2 > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# zpool status -x > > pool: tank > > state: DEGRADED > > status: One or more devices has been taken offline by the administrator. > > Sufficient replicas exist for the pool to continue functioning > in a > > degraded state. > > action: Online the device using 'zpool online' or replace the device > with > > 'zpool replace'. > > scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 > 2012 > > config: > > > > NAME STATE READ WRITE CKSUM > > tank DEGRADED 0 0 0 > > raidz1-0 DEGRADED 0 0 0 > > hast/disk1 ONLINE 0 0 0 > > 11919832608590631234 OFFLINE 0 0 0 was > /dev/hast/disk2 > > hast/disk3 ONLINE 0 0 0 > > hast/disk4 ONLINE 0 0 0 > > hast/disk5 ONLINE 0 0 0 > > hast/disk6 ONLINE 0 0 0 > > hast/disk7 ONLINE 0 0 0 > > hast/disk8 ONLINE 0 0 0 > > hast/disk9 ONLINE 0 0 0 > > hast/disk10 ONLINE 0 0 0 > > > > errors: No known data errors > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# zpool online tank hast/disk2 > > warning: device 'hast/disk2' onlined, but remains in faulted state > > use 'zpool replace' to replace devices that are no longer present > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# zpool replace tank hast/disk2 > > cannot open 'hast/disk2': no such GEOM provider > > must be a full path or shorthand device name > > [root_at_san1 /usr/home/jose]# > > [root_at_san1 /usr/home/jose]# > > > > From: Garrett Cooper < <mailto:yanegomi_at_gmail.com> yanegomi_at_gmail.com> > Date: September 23, 2012 12:25:52 PM PDT > To: "Jose A. Lombera" < <mailto:jose_at_lajni.com> jose_at_lajni.com> > Cc: <mailto:freebsd-current_at_freebsd.org> freebsd-current_at_freebsd.org > Subject: Re: zpool can't bring online disk2 > > On Sun, Sep 23, 2012 at 11:23 AM, Jose A. Lombera < <mailto:jose_at_lajni.com> jose_at_lajni.com> wrote: > > > > Hello! all, > > > > I hope someone can help me out with this. > > > > Recently disk2 when bad, I have used > > > > Zpool offline tank hast/disk2 > > > > To bring the disk offline. > > Then I replaced it. > > > > > > > > And use the command > > > > Zpool online tank hast/disk2 > > > > But the disk show REMOVE. > > > > > > > > > > > > [root_at_san1 /usr/home/jose]# zpool status -v > > pool: tank > > state: DEGRADED > > status: One or more devices has been removed by the administrator. > > > > Sufficient replicas exist for the pool to continue functioning > in a > > degraded state. > > > > action: Online the device using 'zpool online' or replace the device > with > > > > 'zpool replace'. > > > > scan: resilvered 2.49M in 0h2m with 0 errors on Sat Sep 22 01:03:13 > 2012 > > config: > > > > NAME STATE READ WRITE CKSUM > > > > tank DEGRADED 0 0 0 > > > > raidz1-0 DEGRADED 0 0 0 > > > > hast/disk1 ONLINE 0 0 0 > > > > 11919832608590631234 REMOVED 0 0 0 was > > /dev/hast/disk2 > > > > hast/disk3 ONLINE 0 0 0 > > > > hast/disk4 ONLINE 0 0 0 > > > > hast/disk5 ONLINE 0 0 0 > > > > hast/disk6 ONLINE 0 0 0 > > > > hast/disk7 ONLINE 0 0 0 > > > > hast/disk8 ONLINE 0 0 0 > > > > hast/disk9 ONLINE 0 0 0 > > > > hast/disk10 ONLINE 0 0 0 > > > > [root_at_san1 /usr/home/jose]# zpool online tank hast/disk2 > > > > warning: device 'hast/disk2' onlined, but remains in faulted state > > > > use 'zpool replace' to replace devices that are no longer present > > > > [root_at_san1 /usr/home/jose]# > > > > I can't bring it back online. > > > > Can you guys help me out what to do. > > > > This is a production server and I can't afford to bring the server down. > > > > I have already swap 3 disks and I got the same result. > > > > Thank you guys in advance. > > > You forgot to call zpool replace as the last step in the process of > replacing your faulted disk: > <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html . > Cheers, > -Garrett > > _______________________________________________ > <mailto:freebsd-current_at_freebsd.org> freebsd-current_at_freebsd.org mailing list > <http://lists.freebsd.org/mailman/listinfo/freebsd-current> http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to " <mailto:freebsd-current-unsubscribe_at_freebsd.org> freebsd-current-unsubscribe_at_freebsd.org" > _______________________________________________ <mailto:freebsd-current_at_freebsd.org> freebsd-current_at_freebsd.org mailing list <http://lists.freebsd.org/mailman/listinfo/freebsd-current> http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to " <mailto:freebsd-current-unsubscribe_at_freebsd.org> freebsd-current-unsubscribe_at_freebsd.org"Received on Mon Sep 24 2012 - 03:50:32 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:30 UTC