Re: access to hard drives is "blocked" by writes to a flash drive

From: Ian Lepore <ian_at_FreeBSD.org>
Date: Tue, 05 Mar 2013 09:18:33 -0700
On Mon, 2013-03-04 at 21:33 -0800, Don Lewis wrote:
> On  4 Mar, Ian Lepore wrote:
> > On Sun, 2013-03-03 at 20:28 +0000, Steven Hartland wrote:
> >> ----- Original Message ----- 
> >> From: "Ian Lepore" <ian_at_FreeBSD.org>
> >> To: "Poul-Henning Kamp" <phk_at_phk.freebsd.dk>
> >> Cc: "deeptech71" <deeptech71_at_gmail.com>; <freebsd-current_at_FreeBSD.org>; "Peter Jeremy" <peter_at_rulingia.com>
> >> Sent: Sunday, March 03, 2013 1:54 PM
> >> Subject: Re: access to hard drives is "blocked" by writes to a flash drive
> >> 
> >> 
> >> > On Sun, 2013-03-03 at 13:35 +0000, Poul-Henning Kamp wrote:
> >> >> Content-Type: text/plain; charset=ISO-8859-1
> >> >> --------
> >> >> In message <1362317291.1195.216.camel_at_revolution.hippie.lan>, Ian Lepore writes
> >> >> :
> >> >> 
> >> >> >I run into this behavior all the time too, mostly on arm systems that
> >> >> >have an sd card or usb thumb driver as their main/only drive.
> >> >> 
> >> >> This is really a FAQ and I belive I have answered it N times already:
> >> >> 
> >> >> There are, broadly speaking, two classes of flash-storage: "Camera-grade"
> >> >> and "the real thing".
> >> >> 
> >> >> "Camera-grade" have a very limited "Flash adaptation layer" which typically
> >> >> only can hold one flash-block open for writing at a time, and is typically
> >> >> found in CF and SD cards, USB sticks etc.
> >> >> 
> >> >> Some of them gets further upset if the filesystem is not the FAT they
> >> >> expect, because they implement "M-Systems" (patented) trick with monitoring
> >> >> block deletes in FAT to simulate a TRIM facility.
> >> >> 
> >> >> A number of products exist with such designs, typically a CF-style, is
> >> >> put behind a SATA-PATA bridge and sold as 2.5" SSD SATA devices.
> >> >> "Transcend" have done this for instance.
> >> >> 
> >> >> If you use this class of devices for anything real, gstat will show
> >> >> you I/O write-times of several seconds in periodic pile-ups, even
> >> >> 100 seconds if you are doing something heavy.
> >> >> 
> >> >> For various reasons (see: Lemming-syncer) FreeBSD will block all I/O
> >> >> traffic to other disks too, when these pileups gets too bad.
> >> > 
> >> > Hmmm, so the problem has been known and unfixed for 10 years.  That's
> >> > not encouraging.  One of the messages in the lemming-syncer mail thread
> >> > might explain why I've been seeing this a lot lately in hobbyist work,
> >> > but not so much at $work where we use sd cards heavily... we use very
> >> > short syncer timeouts on SD and CF storage at $work:
> >> > 
> >> > kern.metadelay: 3
> >> > kern.dirdelay: 4
> >> > kern.filedelay: 5
> >> > 
> >> > I might play with similar settings on some of my arm boards here.
> >> 
> >> Interesting, are these relevant for all filesystems e.g. ZFS?
> >> 
> >>     Regards
> >>     Steve
> > 
> > I'm not sure, I know almost nothing about zfs.  I do know we used those
> > tunings for a specific reason and I sure wouldn't recommend them for
> > general use.  There's a comment block at the top of kern/vfs_subr.c with
> > some information on those delay values and how they're used that you
> > might find useful.  I think in general such small numbers on a system
> > doing lots of IO would be counter-productive.
> > 
> > In our case we arrived at those tunings this way...  We have embedded
> > systems with CF and SD cards as their only mass storage, and we do a
> > relatively small amount of writing to the cards (occasional config
> > changes, low-volume routine logging via syslog, not much else).
> > Occasionally when newsyslog kicks in there'll be a short burst of IO to
> > compress and rotate, then back to a trickle again.  Once upon a time we
> > mounted the filesystem without softupdates and with the sync option.
> > That was fairly robust against users pulling the plug right after making
> > a config change, but very very slow during syslog rotation, sometimes to
> > the point of peturbing our apps (the CF cards run in PIO mode, and a
> > burst of PIO activity is hard on time-critical apps).
> > 
> > So we switched using softupdates and turned off the sync option.  That
> > was nicer to our apps, but left a long window during which updates
> > didn't get flushed to the card.  So we lowered those 3 tuning values to
> > the lowest numbers supported by the code (as I vaguely remember it, this
> > was years ago), not to get any sort of change in performance, but to
> > reduce the window during which data just sat around in memory waiting
> > for potential further updates before being flushed.  The further updates
> > would never happen for us, so the long delays had no benefit.
> 
> This tuning could potentially increase the amount of I/O that actually
> occurs.  The only advantage would be that large files that are
> sequentially written would be flushed to disk more frequently but in
> smaller amounts.
> 

I don't think so, in our case.  The mechanism that would trigger more IO
would be the lack of opportunity to elide multiple rewrites of the same
blocks that occur within the delay windows, and that situation just
doesn't come up in any significant way in our products.

We write to the card on the order of once every 10-180 seconds, usually
1 or 2 blocks updated, like appending a few lines to /var/log/messages
or saving a 900 byte config file.  About the biggest thing that ever
gets written at once is the compressed output of rotating a log file,
which means over the course of a second or two we write 20-50 kbytes,
then go back to nothing for many seconds.

In other words, the point I was trying to make was that these numbers
are very much a special-case tuning based on carefully studying our
situation and needs, and they are NOT good numbers to "just try" on any
normal kind of system.  I mentioned them in the first place only in
conjuction with some idle speculation that maybe they accidentally cause
better responsiveness on those systems (which is something I've noticed
if I'm doing something unusual like untarring a file or other developer
type stuff), and if so, maybe there's a clue there about the nature of
the unresponsiveness.

> To avoid the potential problem of lost config changes, could you put a
> wrapper around the editor to fsync the config file?
> 

It's not a "the editor" situation.  Anything written to the sd card we
want to get flushed out to the card "pretty quickly" regardless of what
the source of the write is.  We used to define "pretty quickly" as
"mount with option sync" and then we decided that a couple seconds of
latency was acceptable.  Given that the write latency exists only as an
opportunity for optimizations that generally just can't occur in our
workload, long delays were all downside for us.

-- Ian
Received on Tue Mar 05 2013 - 15:18:39 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:35 UTC