RE: zpool can't bring online disk2 ----I screwed up

From: Jose A. Lombera <jose_at_lajni.com>
Date: Sun, 23 Sep 2012 22:50:28 -0700
This is the error I got when I run the failover script.

 

Sep 24 06:43:39 san1 hastd[3404]: [disk3] (primary) Provider /dev/mfid3 is not part of resource disk3.

Sep 24 06:43:39 san1 hastd[3343]: [disk3] (primary) Worker process exited ungracefully (pid=3404, exitcode=66).

Sep 24 06:43:39 san1 hastd[3413]: [disk6] (primary) Provider /dev/mfid6 is not part of resource disk6.

Sep 24 06:43:39 san1 hastd[3343]: [disk6] (primary) Worker process exited ungracefully (pid=3413, exitcode=66).

Sep 24 06:43:39 san1 hastd[3425]: [disk10] (primary) Unable to open /dev/mfid10: No such file or directory.

Sep 24 06:43:39 san1 hastd[3407]: [disk4] (primary) Provider /dev/mfid4 is not part of resource disk4.

Sep 24 06:43:39 san1 hastd[3343]: [disk10] (primary) Worker process exited ungracefully (pid=3425, exitcode=66).

Sep 24 06:43:39 san1 hastd[3410]: [disk5] (primary) Provider /dev/mfid5 is not part of resource disk5.

Sep 24 06:43:39 san1 hastd[3343]: [disk4] (primary) Worker process exited ungracefully (pid=3407, exitcode=66).

Sep 24 06:43:39 san1 hastd[3416]: [disk7] (primary) Provider /dev/mfid7 is not part of resource disk7.

Sep 24 06:43:39 san1 hastd[3422]: [disk9] (primary) Provider /dev/mfid9 is not part of resource disk9.

Sep 24 06:43:39 san1 hastd[3419]: [disk8] (primary) Provider /dev/mfid8 is not part of resource disk8.

Sep 24 06:43:39 san1 hastd[3343]: [disk5] (primary) Worker process exited ungracefully (pid=3410, exitcode=66).

Sep 24 06:43:40 san1 hastd[3343]: [disk9] (primary) Worker process exited ungracefully (pid=3422, exitcode=66).

Sep 24 06:43:40 san1 hastd[3343]: [disk8] (primary) Worker process exited ungracefully (pid=3419, exitcode=66).

Sep 24 06:43:40 san1 hastd[3343]: [disk7] (primary) Worker process exited ungracefully (pid=3416, exitcode=66).

Sep 24 06:43:40 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803).

Sep 24 06:43:45 san1 hastd[3348]: [disk1] (primary) Split-brain condition!

Sep 24 06:43:50 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803).

Sep 24 06:43:55 san1 hastd[3348]: [disk1] (primary) Split-brain condition!

Sep 24 06:44:00 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803).

Sep 24 06:44:05 san1 hastd[3348]: [disk1] (primary) Split-brain condition!

Sep 24 06:44:10 san1 hastd[3351]: [disk2] (primary) Resource unique ID mismatch (primary=2635341666474957411, secondary=5944493181984227803)

 

 

Is there any patch I need to run to fix this issue?

 

 

 

From: Jose A. Lombera [mailto:jose_at_lajni.com] 
Sent: Sunday, September 23, 2012 10:00 PM
To: freebsd-current_at_freebsd.org
Cc: freebsd-current_at_freebsd.org
Subject: RE: zpool can't bring online disk2 ----I screwed up

 

Everytime I run this for any of the disk 3,4,5,6,7,8,9,10

Disk 1,2 shows in the /dev/hast

 

[root_at_san2 /usr/home/jose]# hastctl role primary disk3

[root_at_san2 /usr/home/jose]#

 

I got this in the logs.

 

Sep 23 21:58:13 san2 hastd[2793]: [disk3] (primary) Provider /dev/mfid3 is not part of resource disk3.

 

Please help.

 

Thanks.

 

 

 

From: Jose A. Lombera [mailto:jose_at_lajni.com] 
Sent: Sunday, September 23, 2012 9:46 PM
To: 'Freddie Cash'
Cc: freebsd-current_at_freebsd.org
Subject: RE: zpool can't bring online disk2 ----I screwed up

 

Please, some one help me….!!!

 

I screw up big time.

 

 

I was doing the 

 

Hastctl create disk2

 

But since I got some input out errors I decided to stop   /etc/rc.d/hastd stop

But since couldn’t stop disk1 and 9 I killed it.

Restarted both servers.

 

And now only  /dev/hast  shows nothing.

And the pool is lost.

 

I was able to create disk2.

I have restarted both server but  the pool is not coming up.

 

Any suggestions, please help I know that the info is there since I only did “hastctl create disk2” I haven’t done it for the other disks.

 

 

 

 

 

From: Jose A. Lombera [mailto:jose_at_lajni.com] 
Sent: Sunday, September 23, 2012 8:10 PM
To: 'Freddie Cash'
Cc: freebsd-current_at_freebsd.org
Subject: RE: zpool can't bring online disk2

 

Freddie,

 

Thanks for your great help, now makes so much sense.

I still have a small problem, and I'm not sure if it is because hastd is running.

I can't initialize (hastctl create disk2) disk2

 

This is what I did.

 

1.. zpool offline tank /dev/dsk/hast/disk2

2. zpool status -x

[root_at_san /usr/home/jose]# zpool status -x

  pool: tank

state: DEGRADED

status: One or more devices has been taken offline by the administrator.

        Sufficient replicas exist for the pool to continue functioning in a

        degraded state.

action: Online the device using 'zpool online' or replace the device with

        'zpool replace'.

scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 2012

config:

 

        NAME                      STATE     READ WRITE CKSUM

        tank                      DEGRADED     0     0     0

          raidz1-0                DEGRADED     0     0     0

            hast/disk1            ONLINE       0     0     0

            11919832608590631234  OFFLINE      0     0     0  was /dev/dsk/hast/disk2

            hast/disk3            ONLINE       0     0     0

            hast/disk4            ONLINE       0     0     0

            hast/disk5            ONLINE       0     0     0

            hast/disk6            ONLINE       0     0     0

            hast/disk7            ONLINE       0     0     0

            hast/disk8            ONLINE       0     0     0

            hast/disk9            ONLINE       0     0     0

            hast/disk10           ONLINE       0     0     0

 

errors: No known data errors

 

3. removed disk / insert a new one.

4. initialize

     Hastctl role init disk2

    [root_at_san /usr/home/jose]# hastctl status disk2

disk2:

  role: init

  provname: disk2

  localpath: /dev/mfid2

  extentsize: 0 (0B)

  keepdirty: 0

  remoteaddr: san1

  replication: fullsync

  dirty: 0 (0B)

  statistics:

    reads: 0

    writes: 0

    deletes: 0

    flushes: 0

    activemap updates: 0

[root_at_san /usr/home/jose]# 

[root_at_san /usr/home/jose]# 

[root_at_san /usr/home/jose]# hastctl create disk2

[ERROR] [disk2] Unable to write metadata: Input/output error.

 

 

 

I don't want to stop hastd since it will shut down the connection to my san.

 

Do you have any suggestion?

 

Thanks

 

 

--jose

 

 

-----Original Message-----
From: owner-freebsd-current_at_freebsd.org [mailto:owner-freebsd-current_at_freebsd.org] On Behalf Of Freddie Cash
Sent: Sunday, September 23, 2012 6:30 PM
To: compufutura -the computer of the future
Cc: yanegomi_at_gmail.com; freebsd-current_at_freebsd.org
Subject: RE: zpool can't bring online disk2

 

Since it's a HAST device, you have to initialise the disk via hastctl. Once that is done, the /dev/hast/disk2 GEOM device node will be created.

 

Then you can 'zpool replace' it.

 

One step at a time. :)  And you've skipped a few.

 

1. 'zpool offline' the defective disk

2. Physically remove the defective disk

3. Physically insert the new disk

4. Initialise it as a HAST resource via 'hastctl'

5. 'zpool replace' it using the /dev/hast node 6. Wait for the pool (and HAST) to resilver it 7. Carry on as per normal  On Sep 23, 2012 2:28 PM, "compufutura -the computer of the future" <  <mailto:jose_at_compufutura.com> jose_at_compufutura.com> wrote:

 

> Yanegomi,

> 

> 

> 

> I tried that, as you can see below, freebsd doesn’t have cfgadm

> 

> Utility to un configure the device, according to, 

>  <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html, I 

> looked to ports but there is no utility like that.

> 

> 

> 

> Pardon me, my knowledge is little.

> 

> 

> 

> Can you please type the command I will need, or if I need cfgadm do I 

> have to look for that and install it in my freebsd box?

> 

> 

> 

> Thanks.

> 

> 

> 

> 

> 

> [root_at_san1 /usr/home/jose]# zpool offline tank hast/disk2

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]# zpool status -x

> 

>   pool: tank

> 

> state: DEGRADED

> 

> status: One or more devices has been taken offline by the administrator.

> 

>         Sufficient replicas exist for the pool to continue functioning 

> in a

> 

>         degraded state.

> 

> action: Online the device using 'zpool online' or replace the device 

> with

> 

>         'zpool replace'.

> 

> scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 

> 2012

> 

> config:

> 

> 

> 

>         NAME                      STATE     READ WRITE CKSUM

> 

>         tank                      DEGRADED     0     0     0

> 

>           raidz1-0                DEGRADED     0     0     0

> 

>             hast/disk1            ONLINE       0     0     0

> 

>             11919832608590631234  OFFLINE      0     0     0  was

> /dev/hast/disk2

> 

>             hast/disk3            ONLINE       0     0     0

> 

>             hast/disk4            ONLINE       0     0     0

> 

>             hast/disk5            ONLINE       0     0     0

> 

>             hast/disk6            ONLINE       0     0     0

> 

>             hast/disk7            ONLINE       0     0     0

> 

>             hast/disk8            ONLINE       0     0     0

> 

>             hast/disk9            ONLINE       0     0     0

> 

>             hast/disk10           ONLINE       0     0     0

> 

> 

> 

> errors: No known data errors

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]# zpool replace tank hast/disk2

> 

> cannot open 'hast/disk2': no such GEOM provider

> 

> must be a full path or shorthand device name

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]# cfgadm

> 

> bash: cfgadm: command not found

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]# zpool offline tank hast/disk2

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]# zpool status -x

> 

>   pool: tank

> 

> state: DEGRADED

> 

> status: One or more devices has been taken offline by the administrator.

> 

>         Sufficient replicas exist for the pool to continue functioning 

> in a

> 

>         degraded state.

> 

> action: Online the device using 'zpool online' or replace the device 

> with

> 

>         'zpool replace'.

> 

> scan: scrub repaired 0 in 12h4m with 0 errors on Sun Sep 23 19:14:19 

> 2012

> 

> config:

> 

> 

> 

>         NAME                      STATE     READ WRITE CKSUM

> 

>         tank                      DEGRADED     0     0     0

> 

>           raidz1-0                DEGRADED     0     0     0

> 

>             hast/disk1            ONLINE       0     0     0

> 

>             11919832608590631234  OFFLINE      0     0     0  was

> /dev/hast/disk2

> 

>             hast/disk3            ONLINE       0     0     0

> 

>             hast/disk4            ONLINE       0     0     0

> 

>             hast/disk5            ONLINE       0     0     0

> 

>             hast/disk6            ONLINE       0     0     0

> 

>             hast/disk7            ONLINE       0     0     0

> 

>             hast/disk8            ONLINE       0     0     0

> 

>             hast/disk9            ONLINE       0     0     0

> 

>             hast/disk10           ONLINE       0     0     0

> 

> 

> 

> errors: No known data errors

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]# zpool online tank hast/disk2

> 

> warning: device 'hast/disk2' onlined, but remains in faulted state

> 

> use 'zpool replace' to replace devices that are no longer present

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]# zpool replace tank hast/disk2

> 

> cannot open 'hast/disk2': no such GEOM provider

> 

> must be a full path or shorthand device name

> 

> [root_at_san1 /usr/home/jose]#

> 

> [root_at_san1 /usr/home/jose]#

> 

> 

> 

> From: Garrett Cooper < <mailto:yanegomi_at_gmail.com> yanegomi_at_gmail.com>

> Date: September 23, 2012 12:25:52 PM PDT

> To: "Jose A. Lombera" < <mailto:jose_at_lajni.com> jose_at_lajni.com>

> Cc:  <mailto:freebsd-current_at_freebsd.org> freebsd-current_at_freebsd.org

> Subject: Re: zpool can't bring online disk2

> 

> On Sun, Sep 23, 2012 at 11:23 AM, Jose A. Lombera < <mailto:jose_at_lajni.com> jose_at_lajni.com> wrote:

> 

> 

> 

> Hello! all,

> 

> 

> 

> I hope someone can help me out with this.

> 

> 

> 

> Recently disk2 when bad, I have used

> 

> 

> 

> Zpool offline tank hast/disk2

> 

> 

> 

> To bring the disk offline.

> 

> Then I replaced it.

> 

> 

> 

> 

> 

> 

> 

> And use the command

> 

> 

> 

> Zpool online tank hast/disk2

> 

> 

> 

> But the disk show   REMOVE.

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> 

> [root_at_san1 /usr/home/jose]# zpool status -v

> 

>  pool: tank

> 

> state: DEGRADED

> 

> status: One or more devices has been removed by the administrator.

> 

> 

> 

>        Sufficient replicas exist for the pool to continue functioning 

> in a

> 

>        degraded state.

> 

> 

> 

> action: Online the device using 'zpool online' or replace the device 

> with

> 

> 

> 

>        'zpool replace'.

> 

> 

> 

> scan: resilvered 2.49M in 0h2m with 0 errors on Sat Sep 22 01:03:13 

> 2012

> 

> config:

> 

> 

> 

>        NAME                      STATE     READ WRITE CKSUM

> 

> 

> 

>        tank                      DEGRADED     0     0     0

> 

> 

> 

>          raidz1-0                DEGRADED     0     0     0

> 

> 

> 

>            hast/disk1            ONLINE       0     0     0

> 

> 

> 

>            11919832608590631234  REMOVED      0     0     0  was

> 

> /dev/hast/disk2

> 

> 

> 

>            hast/disk3            ONLINE       0     0     0

> 

> 

> 

>            hast/disk4            ONLINE       0     0     0

> 

> 

> 

>            hast/disk5            ONLINE       0     0     0

> 

> 

> 

>            hast/disk6            ONLINE       0     0     0

> 

> 

> 

>            hast/disk7            ONLINE       0     0     0

> 

> 

> 

>            hast/disk8            ONLINE       0     0     0

> 

> 

> 

>            hast/disk9            ONLINE       0     0     0

> 

> 

> 

>            hast/disk10           ONLINE       0     0     0

> 

> 

> 

> [root_at_san1 /usr/home/jose]# zpool online tank hast/disk2

> 

> 

> 

> warning: device 'hast/disk2' onlined, but remains in faulted state

> 

> 

> 

> use 'zpool replace' to replace devices that are no longer present

> 

> 

> 

> [root_at_san1 /usr/home/jose]#

> 

> 

> 

> I can't bring it back online.

> 

> 

> 

> Can you guys help me out what to do.

> 

> 

> 

> This is a production server and I can't afford to bring the server down.

> 

> 

> 

> I have already swap 3 disks and I got the same result.

> 

> 

> 

> Thank you guys in advance.

> 

> 

>    You forgot to call zpool replace as the last step in the process of 

> replacing your faulted disk:

>  <http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html> http://docs.oracle.com/cd/E19253-01/819-5461/gbcet/index.html .

> Cheers,

> -Garrett

> 

> _______________________________________________

>  <mailto:freebsd-current_at_freebsd.org> freebsd-current_at_freebsd.org mailing list 

>  <http://lists.freebsd.org/mailman/listinfo/freebsd-current> http://lists.freebsd.org/mailman/listinfo/freebsd-current

> To unsubscribe, send any mail to " <mailto:freebsd-current-unsubscribe_at_freebsd.org> freebsd-current-unsubscribe_at_freebsd.org"

> 

_______________________________________________

 <mailto:freebsd-current_at_freebsd.org> freebsd-current_at_freebsd.org mailing list  <http://lists.freebsd.org/mailman/listinfo/freebsd-current> http://lists.freebsd.org/mailman/listinfo/freebsd-current

To unsubscribe, send any mail to " <mailto:freebsd-current-unsubscribe_at_freebsd.org> freebsd-current-unsubscribe_at_freebsd.org"
Received on Mon Sep 24 2012 - 03:50:32 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:30 UTC