Re: boot errors since upgrading to 12-current

From: Toomas Soome <tsoome_at_me.com>
Date: Wed, 15 Aug 2018 09:31:03 +0300
> On 15 Aug 2018, at 06:06, tech-lists <tech-lists_at_zyxst.net> wrote:
> 
> On 14/08/2018 21:16, Toomas Soome wrote:
>>> On 14 Aug 2018, at 22:37, tech-lists <tech-lists_at_zyxst.net> wrote:
>>> Hello,
>>> context: amd64, FreeBSD 12.0-ALPHA1 #0 r337682, ZFS. The system is
>>> *not* root-on-zfs. It boots to an SSD. The three disks indicated
>>> below are spinning rust.
>>> NAME        STATE     READ WRITE CKSUM storage     ONLINE       0
>>> 0     0 raidz1-0  ONLINE       0     0     0 ada1    ONLINE       0
>>> 0     0 ada2    ONLINE       0     0     0 ada3    ONLINE       0
>>> 0     0
>>> This machine was running 11.2 up until about a month ago.
>>> Recently I've seen this flash up on the screen before getting to
>>> the beastie screen:
>>> BIOS drive C: is disk0 BIOS drive D: is disk1 BIOS drive E: is
>>> disk2 BIOS drive F: is disk3 BIOS drive G: is disk4 BIOS drive H:
>>> is disk5 BIOS drive I: is disk6 BIOS drive J: is disk7
>>> [the above is normal and has always has been seen on every boot]
>>> read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to
>>> 0xcbdb1330, error: 0x31 read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to
>>> 0xcbdb1330, error: 0x31 read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to 0xcbdb1330, error: 0x31 read 1 from 0 to
>>> 0xcbdb1330, error: 0x31
>>> the above has been happening since upgrading to -current a month
>>> ago
>>> ZFS: i/o error - all block copies unavailable ZFS: can't read MOS
>>> of pool storage
>>> the above is alarming and has been happening for the past couple of
>>> days, since upgrading to r337682 on the 12th August.
>>> The beastie screen then loads and it boots normally.
>>> Should I be concerned? Is the output indicative of a problem?
>> Not immediately and yes. In BIOS loader, we do all disk IO with INT13
>> and the error 0x31 is often hinting about missing media or some other
>> controller related error. Could you paste the output from loader
>> lsdev -v output?
>> The drive list appears as an result of probing the disks in
>> biosdisk.c. The read errors are from attempt to read 1 sector from
>> sector 0 (that is, to read the partition table from the disk). Why
>> this does end with error, would be interesting to know, unfortunately
>> that error does not tell us which disk was probed.
> 
> Hi Toomas, thanks for looking at this.
> 
> lsdev -v looks like this:
> 
> OK lsdev -v
> disk devices:
> 	disk0: BIOS drive C (16514064 X 512):
> 	disk0s1: FreeBSD          111GB
> 	disk0s1a: FreeBSD UFS     108GB
> 	disk0s1b: FreeBSD swap    3881MB
> 
> 	disk1: BIOS drive D (16514064 X 512):
> 	disk2: BIOS drive E (16514064 X 512):
> 	disk3: BIOS drive F (16514064 X 512):
> 	disk4: BIOS drive G (2880 X 512):
> read 1 from 0 to 0xcbde0a20, error 0x31
> 	disk5: BIOS drive D (2880 X 512):
> read 1 from 0 to 0xcbde0a20, error 0x31
> 	disk6: BIOS drive D (2880 X 512):
> read 1 from 0 to 0xcbde0a20, error 0x31
> 	disk7: BIOS drive D (2880 X 512):
> read 1 from 0 to 0xcbde0a20, error 0x31
> OK
> 
> disk4 to disk7 corresponds with da0 to da3 which are sd/mmc devices without any media in. What made me notice it is it never showed the read 1 from 0 to $random_value on 11-stable. The system runs 12-current now.

Yea, its not about random value, but the rework to process the missing media is on the way to current, stay tuned:)

> 
> disk1 to disk3 are the hard drives making up ZFS. These are 4TB Western Digital SATA-3 WDC WD4001FAEX.

Well that does explain the problem, if you look on the sizes reported… so your BIOS is reporting wrong sizes, is unable to access whole 4TB space and the zfs reader is not getting the correct data from the disks - and is resulting with errors. Thats why you get the errors from ‘storage’ pool and yes, this is harmless for boot because you have separate (small) disk for the boot.

rgds,
toomas

> 
>>> Since you are getting errors from data pool ‘storage’, it does not
>>> affect the boot. Why the pool storage is unreadable - it likely has
>>> to do about the errors above, but can not tell for sure based on the
>>> data presented here….
> 
> Thing is, the data pool works fine when boot completes. i.e it loads read/write and behaves normally.
> 
> thanks,
> -- 
> J.
Received on Wed Aug 15 2018 - 04:31:08 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC