Hello! At Netflix and Nginx we are experimenting with improving FreeBSD wrt sending large amounts of static data via HTTP. One of the approaches we are experimenting with is new sendfile(2) implementation, that doesn't block on the I/O done from the file descriptor. The problem with classic sendfile(2) is that if the the request length is large enough, and file data is not cached in VM, then sendfile(2) syscall would not return until it fills socket buffer with data. With modern internet socket buffers can be up to 1 Mb, thus time taken by the syscall raises by order of magnitude. All the time, the nginx worker is blocked in syscall and doesn't process data from other clients. The best current practice to mitigate that is known as "sendfile(2) + aio_read(2)". This is special mode of nginx operation on FreeBSD. The sendfile(2) call is issued with SF_NODISKIO flag, that forbids the syscall to perform disk I/O, and send only data that is cached by VM. If sendfile(2) reports that I/O needs to be done (but forbidden), then nginx would do aio_read() of a chunk of the file. The data read is cached by VM, as side affect. Then sendfile() is called again. Now for the new sendfile. The core idea is that sendfile() schedules the I/O, but doesn't wait for it to complete. It returns immediately to the process, and I/O completion is processed in kernel context. Unlike aio(4), no additional threads in kernel are created. The new sendfile is a drop-in replacement for the old one. Applications (like nginx) doesn't need recompile, neither configuration change. The SF_NODISKIO is ignored. At Netflix, we already see improvements with new sendfile(2). We can send more data utilizing same amount of CPU, and we can push closer to 0% idle, without experiencing short lags. However, we have somewhat modified VM subsystem, that behaves optimal for our task, but suboptimal for average FreeBSD system. I'd like someone from community to try the new sendfile(2) at other setup and see how does it serve for you. To be the early tester you need to checkout projects/sendfile branch and build kernel from it. The world from head/ would run fine with it. svn co http://svn.freebsd.org/base/projects/sendfile cd sendfile ... build kernel ... Limitations: - Some subsystems that use socket buffers are not compilable, namely SCTP. - No testing were done on serving files on NFS. - No testing were done on serving files on ZFS. - There is mbuf leak. The leak is very slow. It takes 3 days serving up to 20 Gbit/s to deplete the cluster zone. I'm working on finding the leak. -- Totus tuus, Glebius.Received on Mon Feb 17 2014 - 10:16:45 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:47 UTC