Dimitry Andric wrote: > On 2010-08-17 23:24, Alan Cox wrote: > >>> So normal mmap is ~3% slower, and prefault mmap does not seem to make >>> any measurable difference. I guess the added complexity is not really >>> worth it, for now. >>> >> Do you know what fraction of this time is being spent in the kernel? >> > > I ran 100 trials again, but now using "time -a -o logfile", so I could > run ministat over the accumulated results. This gives: > > x gnugrep > + bsdgrep-r210927 (the initial version that started this thread) > * bsdgrep-r211490 (current version) > % bsdgrep-r211490-mmap-plain > # bsdgrep-r211490-mmap-prefault > > Real time: > N Min Max Median Avg Stddev > x 100 1.15 1.98 1.18 1.2122 0.11159613 > + 100 8.57 14.26 8.79 9.1823 1.0496126 > * 100 2.81 6.57 2.91 3.0189 0.4304259 > % 100 2.34 4.03 2.99 3.0022 0.12635992 > # 100 2.85 3.49 2.88 2.8981 0.075232904 > > User time: > N Min Max Median Avg Stddev > x 100 0 0.07 0.03 0.0239 0.015627934 > + 100 1.6 3.33 1.9 1.976 0.30264824 > * 100 0.29 1 0.39 0.4004 0.08696824 > % 100 1.8 3.56 2.73 2.7274 0.13260117 > # 100 2.78 3.04 2.81 2.8238 0.04039652 > > System time: > N Min Max Median Avg Stddev > x 100 1.08 1.91 1.15 1.1809 0.10953617 > + 100 6.55 10.9 6.94 7.1905 0.77911809 > * 100 2.38 5.5 2.53 2.6061 0.35068445 > % 100 0.18 0.53 0.25 0.2645 0.053586049 > # 100 0.03 0.54 0.06 0.0668 0.052259647 > > E.g. it looks like bsdgrep with 'plain' mmap performs almost the same > as the regular bsdgrep (both around 3.0s average), but with mmap much > more of the time is spent in user mode. > > That makes sense to me. With traditional I/O, such as read(2), the copyout to user space fills the processor's data cache with the data to be processed. Grep's core algorithm in user space shouldn't be experiencing cache misses to obtain the data. By and large, the cache misses will have occurred in the kernel. However, once you switch to mmap(2), the kernel never touches the data, and all cache misses occur in user space. You ought to be able to confirm this with pmcstat's sampling mode set to sample L2 cache misses. Here is what actually puzzles me about these results. With traditional I/O, even after the optimizations to bsdgrep, the system time for gnugrep is still less than half that of the optimized bsdgrep. I haven't looked at the changes, but I would have thought the system time for gnugrep and bsdgrep would be almost the same. > And it seems prefaulting does help now! I guess it also makes sense to > add madvise(..., MADV_SEQUENTIAL)? > > This won't matter as long as you are working with memory resident files. With a memory resident file, it would only be a waste of cycles. > >> Does >> the value of "sysctl vm.pmap.pde.mappings" increase as a result of your >> test? If not, there is still room for improvement in the results with >> mmap(). >> > > It always stays at 0, I have never seen any other value. > Addressing this issue would mostly affect the system time, which is already tiny for mmap-prefault, so I wouldn't be concerned about this (yet). Did you ever describe your test machine? If so, I'm sorry, but I missed that. Is it running an amd64 or i386 kernel? Can you briefly describe what kind of processor and memory it has? Regards, AlanReceived on Thu Aug 19 2010 - 14:16:58 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:06 UTC