Re: Testers wanted: Gvinum patches of SoC 2007 work

From: Barnabas <barnabasdk_at_gmail.com>
Date: Mon, 17 Sep 2007 16:18:42 -0700 (PDT)
Hi Ulf

I also use gvinum in a pretty complex setup, and would be happy to help in
any testing, bug finding and whatever. I work as a developer - not exactly
with kernel hacking - but I am not completely at a loss.

I have had a lot of issues with the semi-finished version of gvinum that is
currently available though the freebsd distro. I have had a lot of DMA
read/write timeout issues lately. Especially under heavy load. But only on
my ata drives - the scsi drives runs perfectly. First I thought it was a
dead disk, but I have been seeing the issue on my new raid5 setup as well.
The same issue on two disk at the same time is not very likely. If you have
any info on how I am to patch the current vinum driver with the new changes
you made it would be great. 

I really appreciate someone is looking at this exellent software. It has not
really worked 100% since the port to geom was started.

Heres my config:

FreeBSD sauron.barnabas.dk 6.2-RELEASE-p6 FreeBSD 6.2-RELEASE-p6 #0: Wed Jul
18 23:33:58 CEST 2007    
root_at_sauron.barnabas.dk:/usr/src/sys/i386/compile/KERNEL_6_2  i386

7 drives:
D elben                 State: up       /dev/da1s1h     A: 0/7825 MB (0%)
D donau                 State: up       /dev/da0s1h     A: 0/7825 MB (0%)
D raid5_4               State: up       /dev/ad11a      A: 6/194480 MB (0%)
D raid5_3               State: up       /dev/ad10a      A: 6/194480 MB (0%)
D raid5_2               State: up       /dev/ad9a       A: 6/194480 MB (0%)
D raid5_1               State: up       /dev/ad8a       A: 6/194480 MB (0%)
D spree                 State: up       /dev/ad4a       A: 3/114473 MB (0%)

6 volumes:
V raid5                 State: up       Plexes:       1 Size:        569 GB
V data01                State: up       Plexes:       1 Size:        111 GB
V usr                   State: up       Plexes:       2 Size:       5625 MB
V home                  State: up       Plexes:       2 Size:       1000 MB
V tmp                   State: up       Plexes:       2 Size:        600 MB
V var                   State: up       Plexes:       2 Size:        600 MB

10 plexes:
P raid5.p0           R5 State: degraded Subdisks:     4 Size:        569 GB
P data01.p0           C State: up       Subdisks:     1 Size:        111 GB
P usr.p1              C State: up       Subdisks:     1 Size:       5625 MB
P home.p1             C State: up       Subdisks:     1 Size:       1000 MB
P tmp.p1              C State: up       Subdisks:     1 Size:        600 MB
P var.p1              C State: up       Subdisks:     1 Size:        600 MB
P usr.p0              C State: up       Subdisks:     1 Size:       5625 MB
P home.p0             C State: up       Subdisks:     1 Size:       1000 MB
P tmp.p0              C State: up       Subdisks:     1 Size:        600 MB
P var.p0              C State: up       Subdisks:     1 Size:        600 MB

13 subdisks:
S raid5.p0.s3           State: stale    D: raid5_4      Size:        189 GB
S raid5.p0.s2           State: up       D: raid5_3      Size:        189 GB
S raid5.p0.s1           State: up       D: raid5_2      Size:        189 GB
S raid5.p0.s0           State: up       D: raid5_1      Size:        189 GB
S data01.p0.s0          State: up       D: spree        Size:        111 GB
S usr.p1.s0             State: up       D: elben        Size:       5625 MB
S home.p1.s0            State: up       D: elben        Size:       1000 MB
S tmp.p1.s0             State: up       D: elben        Size:        600 MB
S var.p1.s0             State: up       D: elben        Size:        600 MB
S usr.p0.s0             State: up       D: donau        Size:       5625 MB
S home.p0.s0            State: up       D: donau        Size:       1000 MB
S tmp.p0.s0             State: up       D: donau        Size:        600 MB
S var.p0.s0             State: up       D: donau        Size:        600 MB

# Vinum configuration of sauron.barnabas.dk, saved at Tue Sep 18 01:11:49
2007
# Current configuration:
# drive elben device /dev/da1s1h
# drive donau device /dev/da0s1h
# drive raid5_4 device /dev/ad11a
# drive raid5_3 device /dev/ad10a
# drive raid5_2 device /dev/ad9a
# drive raid5_1 device /dev/ad8a
# drive spree device /dev/ad4a
# volume raid5
# volume data01
# volume usr
# volume home
# volume tmp
# volume var
# plex name raid5.p0 org raid5 2048s vol raid5
# plex name data01.p0 org concat vol data01
# plex name usr.p1 org concat vol usr
# plex name home.p1 org concat vol home
# plex name tmp.p1 org concat vol tmp
# plex name var.p1 org concat vol var
# plex name usr.p0 org concat vol usr
# plex name home.p0 org concat vol home
# plex name tmp.p0 org concat vol tmp
# plex name var.p0 org concat vol var
# sd name raid5.p0.s3 drive raid5_4 len 398282752s driveoffset 265s plex
raid5.p0 plexoffset 6144s
# sd name raid5.p0.s2 drive raid5_3 len 398282752s driveoffset 265s plex
raid5.p0 plexoffset 4096s
# sd name raid5.p0.s1 drive raid5_2 len 398282752s driveoffset 265s plex
raid5.p0 plexoffset 2048s
# sd name raid5.p0.s0 drive raid5_1 len 398282752s driveoffset 265s plex
raid5.p0 plexoffset 0s
# sd name data01.p0.s0 drive spree len 234434560s driveoffset 265s plex
data01.p0 plexoffset 0s
# sd name usr.p1.s0 drive elben len 11521427s driveoffset 4505865s plex
usr.p1 plexoffset 0s
# sd name home.p1.s0 drive elben len 2048000s driveoffset 2457865s plex
home.p1 plexoffset 0s
# sd name tmp.p1.s0 drive elben len 1228800s driveoffset 1229065s plex
tmp.p1 plexoffset 0s
# sd name var.p1.s0 drive elben len 1228800s driveoffset 265s plex var.p1
plexoffset 0s
# sd name usr.p0.s0 drive donau len 11521427s driveoffset 4505865s plex
usr.p0 plexoffset 0s
# sd name home.p0.s0 drive donau len 2048000s driveoffset 2457865s plex
home.p0 plexoffset 0s
# sd name tmp.p0.s0 drive donau len 1228800s driveoffset 1229065s plex
tmp.p0 plexoffset 0s
# sd name var.p0.s0 drive donau len 1228800s driveoffset 265s plex var.p0
plexoffset 0s

As you can tell the raid setup is not well.

Here is some of the errors I have experienced:

Sep 17 16:54:03 sauron kernel: subdisk10: detached
Sep 17 16:54:03 sauron kernel: ad10: detached
Sep 17 16:54:03 sauron kernel: ad11: FAILURE - device detached
Sep 17 16:54:03 sauron kernel: subdisk11: detached
Sep 17 16:54:03 sauron kernel: ad11: detached
Sep 17 16:54:03 sauron kernel: GEOM_VINUM: subdisk raid5.p0.s3 state change:
up -> down
Sep 17 16:54:03 sauron kernel: GEOM_VINUM: plex raid5.p0 state change: up ->
degraded
Sep 17 16:54:03 sauron kernel: GEOM_VINUM: subdisk raid5.p0.s2 state change:
up -> down
Sep 17 16:54:03 sauron kernel: GEOM_VINUM: plex raid5.p0 state change:
degraded -> down
Sep 17 16:54:03 sauron kernel: GEOM_VINUM: lost drive 'raid5_3'
Sep 17 23:20:58 sauron sshd[15009]: refused connect from
host11-69-static.30-87-b.business.telecomitalia.it (87.30.69.11)
Sep 17 23:22:12 sauron sshd[15039]: refused connect from
host11-69-static.30-87-b.business.telecomitalia.it (87.30.69.11)
Sep 18 00:00:42 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=536025710592, length=16384)]error = 6
Sep 18 00:00:42 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=536025726976, length=49152)]error = 6
Sep 18 00:00:42 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=536025776128, length=131072)]error = 6
Sep 18 00:00:44 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=535918395392, length=49152)]error = 6
Sep 18 00:00:53 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=536025710592, length=16384)]error = 6
Sep 18 00:00:53 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=536025726976, length=49152)]error = 6
Sep 18 00:00:53 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=536025776128, length=131072)]error = 6
Sep 18 00:00:53 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=535918395392, length=49152)]error = 6
Sep 18 00:01:00 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=528017391616, length=32768)]error = 6
Sep 18 00:01:00 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=528108421120, length=49152)]error = 6
Sep 18 00:01:10 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6
Sep 18 00:01:10 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6
Sep 18 00:01:10 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6
Sep 18 00:01:11 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6
Sep 18 00:01:11 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6
Sep 18 00:01:11 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6
Sep 18 00:01:11 sauron kernel:
g_vfs_done():gvinum/raid5[WRITE(offset=511212584960, length=16384)]error = 6
Sep 18 00:01:11 sauron kernel:
g_vfs_done():gvinum/raid5[WRITE(offset=527976808448, length=16384)]error = 6
Sep 18 00:01:11 sauron kernel:
g_vfs_done():gvinum/raid5[WRITE(offset=535877189632, length=16384)]error = 6
Sep 18 00:01:13 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6
Sep 18 00:01:13 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6
Sep 18 00:01:13 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6
Sep 18 00:01:15 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343853568, length=16384)]error = 6
Sep 18 00:01:15 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343869952, length=49152)]error = 6
Sep 18 00:01:15 sauron kernel:
g_vfs_done():gvinum/raid5[READ(offset=511343919104, length=131072)]error = 6

I have seen exactly the same on the data01 disk that is stand alone.

Hope I am able to help.

Nikolaj Hansen





Ulf Lilleengen-6 wrote:
> 
> Hi,
> 
> It's here! The new and hopefully better gvinum patch. This is perhaps my
> final
> patch of the work I've done during GSoC 2007 (the patch will be updated
> when I
> fix a bug). This doesn't mean I'll stop work on gvinum, but rather that
> I'm not
> adding more features until this gets into the tree. But, for this to get
> into
> the tree, I need people to test it. _ALL_ reports on how it works is good.
> 
> So, what should you test?
> 
> * Plain normal use.
> 
> * Mirror synchronization, rebuild if raid-5 arrays, growing of raid-5
> arrays
>   etc. These should work, and probably is the most tested, but some weird
>   combinations that I have not forseen might show itself.
> 
> * Try weird combinations to check if it crashes.
> 
> * Test mirror, concat, stripe and raid5 commands.
> 
> * If there are any issues with the usability aspect. E.g. if the
> information
>   gvinum gives you is good enough for you to understand what it's doing,
> if one
>   way to do things seems unnatural to you etc. I'd like to hear all of
> this, no
>   matter how bikshedish it might sound, it might be something that have
> been
>   overlooked. These things are hard to test for the people that have been
>   developing it, since we know how it "should" be used.
> 
> Before you head on, beware that the new gvinum does not give messages back
> to
> the userland gvinum (so you won't get them into your terminal). This is
> because
> it's not very simple to do with the new event system.
> !! This means you'll have to look after messages in /var/log/messages !!
> 
> And thanks to people for comments and help that I've been getting during
> the summer.
> 
> -- 
> Ulf Lilleengen
> 
> _______________________________________________
> freebsd-current_at_freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe_at_freebsd.org"
> 
> 

-- 
View this message in context: http://www.nabble.com/Testers-wanted%3A-Gvinum-patches-of-SoC-2007-work-tf4263568.html#a12747042
Sent from the freebsd-current mailing list archive at Nabble.com.
Received on Mon Sep 17 2007 - 21:34:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:17 UTC