John Baldwin wrote: > On Tuesday, January 31, 2012 12:21:07 pm Ulrich Spörlein wrote: > > On Mon, 2012-01-30 at 09:36:45 -0500, John Baldwin wrote: > > > On Sunday, January 29, 2012 10:08:10 am Tijl Coosemans wrote: > > > > On Wednesday 25 January 2012 17:29:22 John Baldwin wrote: > > > > > On Friday, January 20, 2012 2:12:13 pm John Baldwin wrote: > > > > >> On Thursday, January 19, 2012 11:39:42 am Tijl Coosemans > > > > >> wrote: > > > > >>> I recently noticed that multimedia/vlc generates a lot of > > > > >>> disk IO when > > > > >>> playing media files. For instance, when playing a 320kbps > > > > >>> mp3 gstat > > > > >>> reports about 1250kBps (=10000kbps). That's quite a lot of > > > > >>> overhead. > > > > >>> > > > > >>> It turns out that vlc sets POSIX_FADV_NOREUSE on the entire > > > > >>> file and > > > > >>> reads in chunks of 1028 bytes. FreeBSD implements NOREUSE as > > > > >>> if > > > > >>> O_DIRECT was specified during open(2), i.e. it disables all > > > > >>> caching. > > > > >>> That means every 1028 byte read turns into a 32KiB read (new > > > > >>> default > > > > >>> block size in 9.0) which explains the above numbers. > > > > >>> > > > > >>> I've copied the relevant vlc code below > > > > >>> (modules/access/file.c:Open()). > > > > >>> It's interesting to see that on OSX it sets F_NOCACHE which > > > > >>> disables > > > > >>> caching too, but combined with F_RDAHEAD there's still > > > > >>> read-ahead > > > > >>> caching. > > > > >>> > > > > >>> I don't think POSIX intended for NOREUSE to mean O_DIRECT. > > > > >>> It should > > > > >>> still cache data (and even do read-ahead if F_RDAHEAD is > > > > >>> specified), > > > > >>> and once data is fetched from the cache, it can be marked > > > > >>> WONTNEED. > > > > >> > > > > >> POSIX doesn't specify O_DIRECT, so it's not clear what it > > > > >> asks for. > > > > >> > > > > >>> Is it possible to implement it this way, or if not to just > > > > >>> ignore > > > > >>> the NOREUSE hint for now? > > > > >> > > > > >> I think it would be good to improve NOREUSE, though I had > > > > >> sort of > > > > >> assumed that applications using NOREUSE would do their own > > > > >> buffering > > > > >> and read full blocks. We could perhaps reimplement NOREUSE by > > > > >> doing > > > > >> the equivalent of POSIX_FADV_DONTNEED after each read to free > > > > >> buffers > > > > >> and pages after the data is copied out to userland. I also > > > > >> have an > > > > >> XXX about whether or not NOREUSE should still allow > > > > >> read-ahead as it > > > > >> isn't very clear what the right thing to do there is. HP-UX > > > > >> (IIRC) > > > > >> has an fadvise() that lets you specify multiple policies, so > > > > >> you > > > > >> could specify both NOREUSE and SEQUENTIAL for a single region > > > > >> to > > > > >> get read-ahead but still release memory once the data is read > > > > >> once. > > > > > > > > > > So I've came up with this untested patch. It uses > > > > > VOP_ADVISE(FADV_DONTNEED) after read(2) calls to a NOREUSE > > > > > region, and > > > > > leaves read-ahead caching enabled for NOREUSE. FADV_DONTNEED > > > > > doesn't > > > > > do any good really for writes (it only flushes clean buffers), > > > > > so I've > > > > > left write(2) operations as using IO_DIRECT still. Does this > > > > > sound > > > > > reasonable? I've not yet tested this at all: > > > > > > > > The patch drastically improves vlc, but there's still a tiny > > > > overhead. > > > > Without NOREUSE the disk is read in chunks of 128KiB (F_RDAHEAD > > > > buffer > > > > size). With NOREUSE there's an extra transfer of 32KiB (block > > > > size). > > > > > > This is probably because vlc is not reading on block boundaries, > > > so the > > > noreuse is throwing away partial blocks at the end of a read that > > > then have to > > > be re-read. We could maybe fix this by making FADV_DONTNEED only > > > throw > > > away completely-contained blocks rather than completely-contained > > > pages. > > > However, this will probably result in NOREUSE not actually > > > throwing away > > > anything at all if an app always reads sub-blocksize chunks. > > > > > > We could maybe make the case of vlc work ok in this case though by > > > allowing > > > an extension where you can do 'posix_fadvise(SEQUENTIAL | > > > NOREUSE)', and > > > in this case we could make the VOP_ADVISE(DONTNEED) in read() use > > > an offset > > > of 0 rather than the start of the read request. > > > > > > However, posix_fadvise() really is going to work best if the > > > userland > > > application reads aligned FS blocks. > > > > I find it questionable in general that an application can tell the > > system what to do wrt. caching. Perhaps I'm running 100s of VLC > > players > > all on the same file and actually *do* want reads to be cached? > > > > What happens if I seek back in the file? It has to do a potentially > > high-latency read again. The system has a better overview of blocks > > that > > are frequently being requested than any individual application. > > > > I fully understand the intention, and in 99.99% of the cases, this > > data > > *is* just being read once so there's no need to cache any reads for > > actually requested data. But as the example shows, requested data is > > not > > necessarily the data that lower layers have to fetch from the disk. > > > > Perhaps taking to VLC people on why they think this is useful and > > where > > it actually, measurably helped them would be interesting. > > > > Sorry if this is all perfectly obvious > > There are certainly cases where the user can choose to run specific > apps in > such a way where this makes sense, so the OS needs this functionality. > As > to whether or not specific apps should use these APIs or if they > should make > use of these APIs configurable, that is a question for each app (e.g. > vlc). > However, the OS should provide the tools. > I'd agree. However, there might be an argument for sysctl that tells the OS to ignore the hints, so a sysadmin can work around a case where an app runs poorly in their environment, due to the hint? rickReceived on Wed Feb 01 2012 - 00:34:02 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:23 UTC