Re: ZFS patches.

From: Claus Guttesen <kometen_at_gmail.com>
Date: Fri, 1 Aug 2008 14:19:35 +0200
>> The patch above contains the most recent ZFS version that could be found
>> in OpenSolaris as of today. Apart for large amount of new functionality,
>> I belive there are many stability (and also performance) improvements
>> compared to the version from the base system.
>>
>> Please test, test, test. If I get enough positive feedback, I may be
>> able to squeeze it into 7.1-RELEASE, but this might be hard.
>>
>
> I applied your patch to a current as of July the 31'st. I had to
> remove /usr/src and perform a clean csup and remove the two empty
> files as mentioned in this thread.
>
> I have a areca arc-1680 sas-card and an external sas-cabinet with 16
> sas-drives each 1 TB (931 binary GB). They have been setup in three
> raidz-partitions with five disks each in one zpool and one spare.
>
> There does seem to be a speed-improvement. I nfs-mounted a partition
> from solaris 9 on sparc and is copying approx.400 GB using rsync. I
> saw write of 429 MB/s. The spikes occured every 10 secs. to begin
> with. After some minutes I get writes almost every sec. (watching
> zpool iostat 1). The limit is clearly the network-connection between
> the two hosts. I'll do some internal copying later.
>
> It's to early to say whether zfs is stable (enough) allthough I
> haven't been able to make it halt unless I removed a disk. This was
> with version 6. I'll remove a disk tomorrow and see how it goes.

Replying to my own mail! :-)

My conclusion about it's stability was a bit hasty. I was copying
approx. 400 GB from a nfs-share mounted from a solaris 9 on sparc
using tcp and read- and write-size of 32768. The files are images
slightly less than 1 MB and a thumbnail (approx. 983000 files).

During creation of my pool I saw these error-messages:

WARNING pid 1065 (zfs): ioctl sign-extension ioctl ffffffffcc285a18
WARNING pid 1067 (zfs): ioctl sign-extension ioctl ffffffffcc285a18
WARNING pid 1069 (zfs): ioctl sign-extension ioctl ffffffffcc285a15
WARNING pid 1070 (zfs): ioctl sign-extension ioctl ffffffffcc285a15
WARNING pid 1076 (zfs): ioctl sign-extension ioctl ffffffffcc285a19
WARNING pid 1077 (zfs): ioctl sign-extension ioctl ffffffffcc285a18
WARNING pid 1079 (zfs): ioctl sign-extension ioctl ffffffffcc285a15

Twice during the copy (rsync) access to the pool stopped.

I took a copy of top during the first and second incident:

last pid:  4287;  load averages:  0.00,  0.17,  0.48
                                  up 0+03:02:30  00:27:42
33 processes:  1 running, 32 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 43M Active, 6350M Inact, 1190M Wired, 220M Cache, 682M Buf, 130M Free
Swap: 8192M Total, 16K Used, 8192M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
 4237 www         1  58    0 23056K 17328K tx->tx 2   0:07  0.00% rsync
 4159 root        1  44    0 14336K  3476K pause  0   0:00  0.00% zsh
 3681 claus       1  44    0 14480K  3524K pause  1   0:00  0.00% zsh
 4154 claus       1  44    0 36580K  3768K select 1   0:00  0.00% sshd
 4273 claus       1  44    0 14480K  3552K pause  3   0:00  0.00% zsh
 4125 www         1  44    0 14600K  3584K ttyin  1   0:00  0.00% zsh
 4120 root        1  44    0 12992K  3088K pause  0   0:00  0.00% zsh
 4156 claus       1  46    0 13140K  3196K pause  2   0:00  0.00% zsh
 4284 root        1  44    0 12992K  3264K pause  1   0:00  0.00% zsh
 3679 claus       1  44    0 36580K  3612K select 2   0:00  0.00% sshd
 1016 root        1  44    0  6768K  1168K nanslp 0   0:00  0.00% cron
 3676 root        1  46    0 36580K  3624K sbwait 0   0:00  0.00% sshd
 4150 root        1  45    0 36580K  3780K sbwait 2   0:00  0.00% sshd
  793 root        1  44    0  5712K  1164K select 1   0:00  0.00% syslogd
 4268 root        1  45    0 36580K  3896K sbwait 1   0:00  0.00% sshd
 4271 claus       1  44    0 36580K  3892K select 2   0:00  0.00% sshd
 4287 root        1  44    0  8140K  1896K CPU0   0   0:00  0.00% top
 4123 root        1  45    0 20460K  1412K wait   0   0:00  0.00% su
 1007 root        1  44    0 24652K  2788K select 1   0:00  0.00% sshd

last pid:  2812;  load averages:  0.01,  0.53,  0.87

             up 0+01:01:45  10:03:55
34 processes:  1 running, 33 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 150M Active, 166M Inact, 1469M Wired, 40K Cache, 680M Buf, 6147M Free
Swap: 8192M Total, 8192M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
 2787 www         1  44    0   117M    99M select 3   0:05  0.00% rsync
 2785 www         1  65    0   117M   100M zio->i 1   0:05  0.00% rsync
 1326 root        1  44    0 14500K  2300K nanslp 3   0:02  0.00% zpool
 1195 claus       1  44    0  8140K  2704K CPU0   0   0:01  0.00% top
 1224 www         1  65    0 14804K  4576K pause  1   0:00  0.00% zsh
 2786 www         1  44    0 98832K 87432K select 0   0:00  0.00% rsync
 1203 claus       1  44    0 36580K  5320K select 0   0:00  0.00% sshd
 1155 claus       1  44    0 14608K  4408K pause  1   0:00  0.00% zsh
 1177 claus       1  44    0 36580K  5320K select 1   0:00  0.00% sshd
 1208 root        1  44    0 15392K  4292K pause  2   0:00  0.00% zsh
 1153 claus       1  44    0 36580K  5320K select 3   0:00  0.00% sshd
 2708 claus       1  44    0 13140K  3976K ttyin  3   0:00  0.00% zsh
 1219 root        1  44    0 12992K  3892K pause  0   0:00  0.00% zsh
 1179 claus       1  44    0 13140K  3976K pause  2   0:00  0.00% zsh
 1205 claus       1  47    0 13140K  3976K pause  1   0:00  0.00% zsh
 1146 root        1  45    0 36580K  5284K sbwait 1   0:00  0.00% sshd
 2703 root        1  46    0 36580K  5276K sbwait 0   0:00  0.00% sshd
 1171 root        1  46    0 36580K  5276K sbwait 0   0:00  0.00% sshd
 1200 root        1  46    0 36580K  5276K sbwait 0   0:00  0.00% sshd
  795 root        1  44    0  5712K  1412K select 0   0:00  0.00% syslogd
 1018 root        1  44    0  6768K  1484K nanslp 2   0:00  0.00% cron
 2706 claus       1  44    0 36580K  5320K select 1   0:00  0.00% sshd
 1222 root        1  45    0 20460K  1840K wait   1   0:00  0.00% su

When copying was completed I then copied the same data to a different
zfs-partition. It stopped once and I saw the following in dmesg:

Aug  1 09:22:02 malene root: ZFS: checksum mismatch, zpool=ef1
path=/dev/da4 offset=294400 size=512

The zpool was defined with three raidz-partitions with five disk each
and one spare.

I need to get some storage available very soon so I re-installed the
server with solaris express b79. Zpool-information (from solaris):

zpool status
  pool: ef1
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        ef1         ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3t0d0  ONLINE       0     0     0
            c3t0d1  ONLINE       0     0     0
            c3t0d2  ONLINE       0     0     0
            c3t0d3  ONLINE       0     0     0
            c3t0d4  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3t0d5  ONLINE       0     0     0
            c3t0d6  ONLINE       0     0     0
            c3t0d7  ONLINE       0     0     0
            c3t1d0  ONLINE       0     0     0
            c3t1d1  ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c3t1d2  ONLINE       0     0     0
            c3t1d3  ONLINE       0     0     0
            c3t1d4  ONLINE       0     0     0
            c3t1d5  ONLINE       0     0     0
            c3t1d6  ONLINE       0     0     0
        spares
          c3t1d7    AVAIL

errors: No known data errors

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
Received on Fri Aug 01 2008 - 10:19:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:33 UTC