Konstantin Belousov wrote: >On Thu, Aug 09, 2018 at 08:38:50PM +0000, Rick Macklem wrote: >> >BTW, does NFS server use extended attributes ? What for ? Can you, please, >> >point out the code which does this ? >> For the pNFS service, there are two system namespace extended attributes for >> each file stored on the service. >> pnfsd.dsfile - Stores where the data for the file is. Can be displayed by the >> pnfsdsfile(8) command. >> >> pnfsd.dsattr - Cached attributes that change when a file is written (size, mtime, >> change) so that the MDS doesn't have to do a Getattr on the data server for every client Getattr. >> > >My reading of the nfsd code + ffs extattr handling reminds me that you >already reported this issue some time ago. I suspected ufs_balloc() at >that time. Yes. I had almost forgotten about them, because I have been testing with a couple of machines (not big, but amd64 with a few Gbytes of RAM) and they never hit the panic(). Recently, I've been using the 256Mbyte i386 and started seeing them again. >Now I think that the situation with the stray buffers hanging on the >queue is legitimate, ffs_extread() might create such buffer and release >it to a clean queue, then removal of the file would see inode with no >allocated ext blocks but with the buffer. > >I think the easiest way to handle it is to always flush buffers and pages >in the ext attr range, regardless of the number of allocated ext blocks. >Patch below was not tested. [patch deleted for brevity] Well, the above sounds reasonable, but the patch didn't help. Here's a small portion of the log a test run last night. - First, a couple of things about the printf()s. When they start with "CL=<N>", the printf() is at the start of ffs_truncate(). "<N>" is a static counter of calls to ffs_truncate(), so "same value" indicates same call. CL=31816 flags=0xc00 vtyp=1 bodirty=0 boclean=1 diextsiz=320 buf at 0x429f260 b_flags = 0x20001020<vmio,reuse,cache>, b_xflags=0x2<clean>, b_vflags=0x0 b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0 b_bufobj = (0xfa3f734), b_data = 0x4c90000, b_blkno = -1, b_lblkno = -1, b_dep = 0 b_kvabase = 0x4c90000, b_kvasize = 32768 CL=34593 flags=0xc00 vtyp=1 bodirty=0 boclean=1 diextsiz=320 buf at 0x429deb0 b_flags = 0x20001020<vmio,reuse,cache>, b_xflags=0x2<clean>, b_vflags=0x0 b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0 b_bufobj = (0xfd3da94), b_data = 0x5700000, b_blkno = -1, b_lblkno = -1, b_dep = 0 b_kvabase = 0x5700000, b_kvasize = 32768 FFST3=34593 vtyp=1 bodirty=0 boclean=1 buf at 0x429deb0 b_flags = 0x20001020<vmio,reuse,cache>, b_xflags=0x2<clean>, b_vflags=0x0 b_error = 0, b_bufsize = 4096, b_bcount = 4096, b_resid = 0 b_bufobj = (0xfd3da94), b_data = 0x5700000, b_blkno = -1, b_lblkno = -1, b_dep = 0 b_kvabase = 0x5700000, b_kvasize = 32768 So, the first one is what typically happens and there would be no panic(). The second/third would be a panic(), since the one that starts with "FFST3" is a printf() that replaces the panic() call. - Looking at the second/third, the number at the beginning is the same, so it is the same call, but for some reason, between the start of the function and where the ffs_truncate3 panic() test is, di_extsize has been set to 0, but the buffer is still there (or has been re-created there by another thread?). Looking at the code, I can't see how this could happen, since there is a vinvalbuf() call after the only place in the code that sets di_extsize == 0, from what I can see? I am going to add printf()s after the vinvalbuf() calls, to make sure they are happening and getting rid of the buffer. If another thread could somehow (re)create the buffer concurrently with the ffs_truncate() call, that would explain it, I think? Just a wild guess, but I suspect softdep_slowdown() is flipping, due to the small size of the machine and this makes the behaviour of ffs_truncate() confusing. I'll post again when I have more info. Thanks for looking at it, rickReceived on Sat Aug 11 2018 - 10:05:27 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:17 UTC