FreeBSD Project Quarterly Status Report - 2nd Quarter 2019 This quarter our report includes some interesting topics easily accessible to anyone, even if you are not a programmer: we report the link to a presentation of the 2019 FreeBSD survey results at BSDCan 2019 and describe an interesting experience of a 3-person hackaton, which might encourage you to host one yourself, possibly with more participants. We also provide some up to date information about the status of our IRC channels. For those who have some more technical skills, we give some news about the role of git in the FreeBSD project, describe the status of some tools to hunt bugs or enhance security and announce a clone of sysctl. Finally, those who are more experienced with programming will probably be interested in the great work that has been done with drivers: in particular, an aknowledgement is due to Alan Somers for having started to bring up to date our FUSE implementation, which was about 11 years behind. Other important improvements include a more user-friendly experience with trackpoints and touchpads enabled by default, much low level work on graphics, many new bhyve features, updates to the linux compatibility layer, various kernel improvements. Have a nice read! -- Lorenzo Salvadore __________________________________________________________________ FreeBSD Team Reports * Continuous Integration * FreeBSD Core Team * FreeBSD Graphics Team status report * IRC Admin * Ports Collection * Release Engineering Team Projects * bhyve - Live Migration * bhyve - Save/Restore * BIO_DELETE support for the swap pager * ENA FreeBSD Driver Update * FreeBSD SDIO and Broadcom FullMAC WiFi Support * FUSE * Fuzzing FreeBSD with syzkaller * Kernel ZLIB Update * Linux compatibility layer update * Lock-less delayed invalidation for amd64 pmap * Locking changes for vnodes during execve(2) * Mellanox Drivers Update * NFSv4.2 client/server implementation for FreeBSD * NUMA awareness in the FreeBSD kernel Architectures * Broadcom ARM64 SoC support * NXP ARM64 SoC support Third-Party Projects * Aberdeen Hackathon * Bring more Security Intelligence to FreeBSD * libvdsk - QCOW2 implementation * nsysctl 1.0 __________________________________________________________________ FreeBSD Team Reports Entries from the various official and semi-official teams, as found in the Administration Page. Continuous Integration Links FreeBSD Jenkins Instance URL: https://ci.FreeBSD.org FreeBSD CI artifact archive URL: https://artifact.ci.FreeBSD.org/ FreeBSD Jenkins wiki URL: https://wiki.freebsd.org/Jenkins freebsd-testing Mailing List URL: https://lists.FreeBSD.org/mailman/listinfo/freebsd-testing freebsd-ci Repository URL: https://github.com/freebsd/freebsd-ci Tickets related to freebsd-testing_at_ URL: https://preview.tinyurl.com/y9maauwg Hosted CI wiki URL: https://wiki.freebsd.org/HostedCI FreeBSD CI weekly report URL: https://hackfoldr.org/freebsd-ci-report/ Contact: Jenkins Admin <jenkins-admin_at_FreeBSD.org> Contact: Li-Wen Hsu <lwhsu_at_FreeBSD.org> The FreeBSD CI team maintains continuous integration system and related tasks for the FreeBSD project. The CI system regularly checks the committed changes can be successfully built, then performs various tests and analysis of the results. The results from build jobs are archived in an artifact server, for the further testing and debugging needs. The CI team members examine the failing builds and unstable tests, and work with the experts in that area to fix the code or adjust test infrastructure. The details are of these efforts are available in the weekly CI reports. The FCP for CI policy is in "feedback" state, please provide any comments to freebsd-testing_at_ or other suitable lists. We had a testing working group in 201905 DevSummit Please see freebsd-testing_at_ related tickets for more information. Work in progress: * Fixing the failing test cases and builds * Adding drm ports building test against -CURRENT * Adding powerpc64 tests job: https://github.com/freebsd/freebsd-ci/pull/33 * Implementing automatic tests on bare metal hardware * Extending and publishing the embedded testbed * Planning for running ztest and network stack tests * Help more 3rd software get CI on FreeBSD through a hosted CI solution __________________________________________________________________ FreeBSD Core Team Contact: FreeBSD Core Team <core_at_FreeBSD.org> The FreeBSD Core Team is the governing body of FreeBSD. * Core approved source commit bits for Doug Moore (dougm), Chuck Silvers (chs), Brandon Bergren (bdragon), and a vendor commit bit for Scott Phillips (scottph). * The annual developer survey closed on 2019-04-02. Of the 397 developers, 243 took the survey with an average completion time of 12 minutes. The public survey closed on 2019-05-13. It was taken by 3637 users and had a 79% completion rate. A presentation of the survey results took place at BSDCan 2019. * The core team voted to appoint a working group to explore transitioning our source code 'source of truth' from Subversion to Git. Core asked Ed Maste to chair the group as Ed has been researching this topic for some time. For example, Ed gave a MeetBSD 2018 talk on the topic. There is a variety of viewpoints within core regarding where and how to host a Git repository, however core feels that Git is the prudent path forward. * The project received many Season of Docs submissions and picked a top candidate. Google will announce the accepted technical writer projects on 2019-08-06. We are hoping for lots of new and refreshed man pages. __________________________________________________________________ FreeBSD Graphics Team status report Links Project GitHub page URL: https://github.com/FreeBSDDesktop Contact: FreeBSD Graphics Team <x11_at_freebsd.org> Contact: Niclas Zeising <zeising_at_freebsd.org> The FreeBSD X11/Graphics team maintains the lower levels of the FreeBSD graphics stack. This includes graphics drivers, graphics libraries such as the MESA OpenGL implementation, the X.org xserver with related libraries and applications, and Wayland with related libraries and applications. In the last report, half a year ago, several updates and changes had been made to the FreeBSD graphics stack. To further improve the user experience, and to improve input device handling, evdev was enabled in the default configuration in late 2018. Building on that, we have enabled IBM/Lenovo trackpoints and elantech and synaptics touchpads by default as well. The input device library libinput has been updated as the last in a series of updates bringing the userland input stack up to date. This is work that was started in 2018. We have made several improvements to the drm kernel drivers. A long-standing memory leak in the Intel (i915) driver has been fixed, and several other updates and improvements have been made to the various drm kernel driver components. A port of the drm kernel drivers using the 5.0 Linux kernel sources has been created and committed to FreeBSD ports as graphics/drm-devel-kmod. This driver requires a recent Linux KPI and is only available on recent versions of FreeBSD CURRENT. This version of the driver contains several development improvements. The generic drm (drm.ko) driver as well as the i915 (i915kms.ko) driver can now be unloaded and reloaded to ease in development and testing. This causes issues with the virtual consoles, however, so an SSH connection is recommended. To aid debugging i915kms.ko use of debugfs has been improved, but there are still limitations preventing it from being fully functional. Since debugfs is based on pseudofs it is possible that this will prevent a fully functional debugfs in its current state, so we might have to look into adding the required functionality to pseudofs or use another framework. The new in-kernel drm driver for VirtualBox, vboxvideo.ko has been ported from Linux. Support is currently an experimental work in progress. For example the virtual console won't update after loading the driver, but X- and Wayland-based compositors are working. Mesa has been updated to 18.3.2 and switched from using devel/llvm60 to use the Ports default version of llvm, currently devel/llvm80. Several userland Xorg drivers, applications, and libraries have been updated, and other improvements to the various userland components that make up the Graphics Stack have been made. We have also continued our regularly scheduled bi-weekly meetings, although work remains in sending out timely meeting minutes afterwards. People who are interested in helping out can find us on the x11_at_FreeBSD.org mailing list, or on our gitter chat: https://gitter.im/FreeBSDDesktop/Lobby. We are also available in #freebsd-xorg on EFNet. We also have a team area on GitHub where our work repositories can be found: https://github.com/FreeBSDDesktop __________________________________________________________________ IRC Admin Contact: IRC Admin <irc_at_FreeBSD.org> The FreeBSD IRC Admin team manages the FreeBSD Project's presence and activity on the freenode IRC network, looking after: * Registration and management of channels within the official namespace (#freebsd*) * Channel moderation * Liaising with freenode staff * Allocating freebsd/* hostmask cloaks for users * General user support relating to channel management While the FreeBSD Project does not currently endorse IRC as an official support channel (see here and here), as it has not been able to guarantee a consistent or positive user experience, IRC Admin has been working toward creating a high quality experience, by standardising channel administration and moderation expectations, and ensuring the projects ability to manage all channels within its namespace. In the last quarter, IRC Admin: * Cleaned up (deregistered) registrations for channels that were defunct, stale, out of date, or had founders that were inactive (not seen for > 1 year). Channels that were found to be otherwise active have been retained. FreeBSD now has ~40 channels registered from a previous total of over 150. * Documented baseline configuration settings in the Wiki for channels, including ChanServ settings, channel modes, registration policy, etc. * Established multiple documented methods for reporting user abuse or other channel issues to IRC Admin for resolution Upcoming changes: * Work with existing #freebsd* channels to standardise channel management, settings and access. * Migrate, forward and/or consolidate existing or duplicate #freebsd* channels to channels with a standard naming convention. * Work with unofficial ##freebsd* channels to migrate them to the official #freebsd* channels if suitable * Update existing IRC-related website and documentation sources the describe the official state of project managed IRC presence on freenode. Lastly, and to repeat a previous call, while the vast majority of the broader user community interacts on the freenode IRC network, the FreeBSD developer presence still needs to be significantly improved on freenode. There are many opportunities to be had by increasing the amount and quality of interaction between FreeBSD users and developers, both in terms of developers keeping their finger on the pulse of the community and in encouraging and cultivating greater contributions to the Project over the long term. It is critical to have a strong developer presence amongst users, and IRC Admin would like again to call on all developers to join the FreeBSD freenode channels to increase that presence. Users are invited to /join #freebsd-irc on the freenode IRC network if they have questions, ideas, constructive criticism, and feedback on how the FreeBSD Project can improve the service and experience it provides to the community on IRC. __________________________________________________________________ Ports Collection Links About FreeBSD Ports URL: https://www.FreeBSD.org/ports/ Contributing to Ports URL: https://www.freebsd.org/doc/en_US.ISO8859-1/articles/contributing/ports-contributing.html FreeBSD Ports Monitoring URL: http://portsmon.freebsd.org/index.html Ports Management Team: URL: https://www.freebsd.org/portmgr/index.html Contact: René Ladan <portmgr-secretary_at_FreeBSD.org> Contact: FreeBSD Ports Management Team <portmgr_at_FreeBSD.org> The following was done during the last quarter by portmgr to keep things in the Ports Tree going: During the last quarter the number of ports rose to just under 37,000. At the end of the quarter, there were 2146 open PRs and 7837 commits (excluding 499 on the quarterly branch) from 172 committers. This shows a slight decrease in activity compared to previous quarter. People come and go, last quarter we welcomed Pedro Giffuni (pfg_at_), Piotr Kubaj (pkubaj_at_) and Hans Petter Selasky (hselasky_at_). Pedro and Hans Petter were already active as src committers. We said goodbye to gordon_at_, kan_at_, tobez_at_, and wosch_at_. On the infrastructure side, a new USES=cabal was introduced and various default versions were updated: MySQL to 5.7, Python to 3.6, Ruby to 2.5, Samba to 4.8 and Julia gained a default version of 1.0. The web browsers were also updated: Firefox to 68.0 and Chromium to 75.0.3770.100 During the last quarter, antoine_at_ ran a total of 41 exp-runs to test various package updates, bump the stack protector level to "strong", switch the default Python version to 3.6 as opposed to 2.7, remove sys/dir.h from base which has been deprecated for over 20 years, and convert all Go ports to USES=go. __________________________________________________________________ Release Engineering Team Links FreeBSD 11.3-RELEASE schedule URL: https://www.freebsd.org/releases/11.3R/schedule.html FreeBSD 11.3-RELEASE announcement URL: https://www.freebsd.org/releases/11.3R/announce.html FreeBSD 12.1-RELEASE schedule URL: https://www.freebsd.org/releases/12.1R/schedule.html FreeBSD development snapshots URL: https://download.freebsd.org/ftp/snapshots/ISO-IMAGES/ Contact: FreeBSD Release Engineering Team <re_at_FreeBSD.org> The FreeBSD Release Engineering Team is responsible for setting and publishing release schedules for official project releases of FreeBSD, announcing code freezes and maintaining the respective branches, among other things. During the second quarter of 2019, the FreeBSD Release Engineering team started the 11.3-RELEASE cycle, with the code slush starting May 3rd. Throughout the cycle, there were three BETA builds and three RC builds, all of which in line with the originally-published schedule. The final RC build started June 28th, with the final release build targeted for July 5th. FreeBSD 11.3-RELEASE will be the fourth release from the stable/11 branch, building on the stability and reliability of 11.2-RELEASE. The FreeBSD Release Engineering Team also published the schedule for the 12.1-RELEASE, targeted to start September 6th. One important thing to note regarding the published schedule is it excludes a hard freeze on the stable/12 branch, as a test run for eliminating code freezes entirely during a release cycle. Commits to what will be the releng/12.1 branch will still require explicit approval from the Release Engineering Team, however. Additionally throughout the quarter, several development snapshots builds were released for the head, stable/12, and stable/11 branches. Much of this work was sponsored by the FreeBSD Foundation and Rubicon Communications, LLC (Netgate). __________________________________________________________________ Projects Projects that span multiple categories, from the kernel and userspace to the Ports Collection or external projects. bhyve - Live Migration Links Github wiki - How to Live and Warm Migrate a bhyve guest URL: https://github.com/FreeBSD-UPB/freebsd/wiki/Virtual-Machine-Migration-using-bhyve Github - Warm Migration branch URL: https://github.com/FreeBSD-UPB/freebsd/tree/projects/bhyve_migration Github - Live Migration branch URL: https://github.com/FreeBSD-UPB/freebsd/tree/projects/bhyve_migration_dev Contact: Elena Mihailescu <elenamihailescu22_at_gmail.com> Contact: Darius Mihai <dariusmihaim_at_gmail.com> Contact: Mihai Carabas <mihai_at_freebsd.org> The Migration feature uses the Save/Restore feature to migrate a bhyve guest from a FreeBSD host to another FreeBSD host. To migrate a bhyve guest, one needs to start an empty guest on the destination host from a shared guest image using the bhyve tool with the -R option followed by the source host IP and the port to listen to migration request. On the source host, the migration is started by executing the bhyvectl command with the --migrate or --migrate-live option, followed by the destination host IP and the port to send to the messages. New features added: * Clear the dirty bit after each migration round * Extend live migration to highmem segment Future tasks: * Refactor live migration branch * Rebase live migration * Extend live migration to unwired memory This project was sponsored by Matthew Grooms. __________________________________________________________________ bhyve - Save/Restore Links Github repository for the snapshot feature for bhyve URL: https://github.com/FreeBSD-UPB/freebsd/tree/projects/bhyve_snapshot Github wiki - How to Save and Restore a bhyve guest URL: https://github.com/FreeBSD-UPB/freebsd/wiki/Save-and-Restore-a-virtual-machine-using-bhyve Github wiki - Suspend/resume test matrix URL: https://github.com/FreeBSD-UPB/freebsd/wiki/Suspend-Resume-test-matrix Phabricator review - bhyve Snapshot Save and Restore URL: https://reviews.freebsd.org/D19495 Contact: Elena Mihailescu <elenamihailescu22_at_gmail.com> Contact: Darius Mihai <dariusmihaim_at_gmail.com> Contact: Mihai Carabas <mihai_at_freebsd.org> The Save/Restore for bhyve feature is a suspend and resume facility added to the FreeBSD/amd64's hypervisor, bhyve. The bhyvectl tool is used to save the guest state in three files (a file for the guest memory, a file for the states of various devices and the state of the CPU, and another one for some metadata that is used in the restore process). To suspend a bhyve guest, the bhyvectl tool must be run with the --suspend <state_file_name> option followed by the guest name. To restore a bhyve guest from a checkpoint, one simply has to add the -r option followed by the main state file (the same file that was given to the --suspend option for bhyvectl) when starting the VM. New features added: * Open ticket on Phabricator * Apply feedback received from community Future tasks: * Add suspend/resume support for nvme * Add suspend/resume support for virtio-console * Add suspend/resume support for virtio-scsi * Add TSC offsetting for restore for AMD CPUs This project was sponsored by Matthew Grooms. __________________________________________________________________ BIO_DELETE support for the swap pager Contact: Doug Moore <dougm_at_FreeBSD.org> Contact: Alan Cox <alc_at_FreeBSD.org> Contact: Mark Johnston <markj_at_FreeBSD.org> An ongoing project aims to teach the swap pager to send SCSI UNMAP or ATA TRIM commands to the swap device when a block of swap space has been freed, for example when the application owning that block is exiting. SSDs have become commonplace and feature low latency for random I/O requests. This makes them appealing for use as swap devices, since lower latencies mean that applications spend less time blocked while waiting for a page-in from the swap device. To maximize write performance, some SSDs require the operating system to send a notification to the disk when a sector is no longer in use; this helps the disk optimize their usage of NAND flash cells. In FreeBSD such a notification is called a BIO_DELETE. FreeBSD's UFS and ZFS filesystems have for a long time been able to transmit BIO_DELETE requests to the devices backing the filesystem. For example, for UFS this support is enabled by specifying -t in newfs(8) or tunefs(8)'s parameters. However, FreeBSD has historically not had a corresponding implementation for swap devices. Thanks to Doug Moore, as of r349286 in -CURRENT and r349930 in stable/12 swapon(8) can send BIO_DELETE to all blocks on the specified device immediately prior to configuring it as a swap device. This is enabled by specifying -E in the swapon(8) parameters, or by adding the "trimonce" option to the swap device's /etc/fstab entry. Some in-progress work on the swap pager implements online block deletion, in which BIO_DELETE is transmitted for blocks as they are freed by applications; this will hopefully be implemented in FreeBSD 13.0. __________________________________________________________________ ENA FreeBSD Driver Update Links ENA README URL: https://github.com/amzn/amzn-drivers/blob/master/kernel/fbsd/ena/README Contact: Michal Krawczyk <mk_at_semihalf.com> Contact: Maciej Bielski <mba_at_semihalf.com> Contact: Marcin Wojtas <mw_at_semihalf.com> ENA (Elastic Network Adapter) is the smart NIC available in the virtualized environment of Amazon Web Services (AWS). The ENA driver supports multiple transmit and receive queues and can handle up to 100 Gb/s of network traffic, depending on the instance type on which it is used. ENAv2 has been under development for FreeBSD, similar to Linux and DPDK. Since the last update internal review and improvements of the patches were done, followed by validation on various AWS instances. Completed since the last update: * Upstream of the ENAv2 patches - revisions r348383 - r348416 introduce a major driver upgrade to version v2.0.0. Along with various fixes and improvements, the most significant features are LLQ (Low Latency Queues) and independent queues reconfiguration using sysctl commands. * Implement NETMAP support for ENA Todo: * Internal review and upstream of NETMAP support This project was sponsored by Amazon.com Inc. __________________________________________________________________ FreeBSD SDIO and Broadcom FullMAC WiFi Support Links FreeBSD Wiki SDIO page URL: https://wiki.freebsd.org/SDIO Contact: Bjoern Zeeb <bz_at_FreeBSD.ORG> SDIO is an interface designed as an extension to SD Cards to allow attachments of various other peripherals, e.g., WiFi or Bluetooth. Work has been ongoing by Ilya Bakulin on the MMCCAM stack to provide the infrastructure to be able to have SD cards and SDIO devices attached side-by-side facilitating FreeBSD's CAM framework. Based on this excellent work over the last years, SDIO support was finished earlier this year and committed to FreeBSD HEAD with the intention to merge to 12 at a later time. Facilitating the newly available SDIO bus, work started to port Broadcom's FullMAC WiFi driver. This work is still in progress and expected to complete later this year. With this WiFi support for the Raspberry Pi and other embedded boards will become available. Likewise drivers for other SDIO devices can be developed now. This project was sponsored by The FreeBSD Foundation. __________________________________________________________________ FUSE Contact: Alan Somers <asomers_at_FreeBSD.org> FUSE (File system in USErspace) allows a userspace program to implement a file system. It is widely used to support out-of-tree file systems like NTFS, as well as for exotic pseudo file systems like sshfs. FreeBSD's fuse driver was added as a GSoC project in 2012. Since that time, it has been largely neglected. The FUSE software is buggy and out-of-date. Our implementation is about 11 years behind. During Q2 I nearly finished the FUSE overhaul that I begain in Q1. I raised the protocol level from 7.8 to 7.23, fixed many bugs (see 199934, 216391, 233783, 234581, 235773, 235774, 235775, 236226, 236231, 236236, 239291, 236329, 236379, 236381, 236405, 236327, 236466, 236472, 236473, 236474, 236530, 236557, 236560, 236647, 236844, 237052, 237181, 237588, and 238565), and added the following features: * Optional kernel-side permissions checks (`-o default_permissions`) * Implement VOP_MKNOD, VOP_BMAP, and VOP_ADVLOCK * Allow interrupting FUSE operations * Support named pipes and unix-domain sockets in fusefs file systems * Forward UTIME_NOW during utimensat(2) to the daemon * kqueue support for /dev/fuse * Allow updating mounts with mount -u * Allow exporting fusefs file systems over NFS * Server-initiated invalidation of the name cache or data cache * Respect RLIMIT_FSIZE * Try to support servers as old as protocol 7.4 I also added the following performance enhancements: * Implement FUSE's FOPEN_KEEP_CACHE and FUSE_ASYNC_READ flags * Cache file attributes * Cache lookup entries, both positive and negative * Server-selectable cache modes: writethrough, writeback, or uncached * Write clustering * Readahead * Use counter(9) for statistical reporting All that remains is to finish merging the branch, and deal with any newly introduced bugs. This project was sponsored by The FreeBSD Foundation. __________________________________________________________________ Fuzzing FreeBSD with syzkaller Links syzkaller URL: https://github.com/google/syzkaller Contact: Mark Johnston <markj_at_FreeBSD.org> Contact: Andrew Turner <andrew_at_FreeBSD.org> Contact: Michael Tuexen <tuexen_at_FreeBSD.org> Contact: Ed Maste <emaste_at_FreeBSD.org> See the syzkaller entry in the 2019q1 quarterly report for an introduction to syzkaller. syzkaller continues to find FreeBSD kernel bugs. A number of such bugs have been fixed in the past quarter, and we continue to investigate and fix bug reports from syzkaller. Work to extend syzkaller's capabilites has progressed: Andrew Turner has implemented support for fuzzing the 32-bit compatibility layer in amd64 kernels, helping illuminate some of the darker corners of the kernel, and it is now possible to use bhyve as a VM backend to syzkaller, so it is now efficient and convenient to fuzz FreeBSD on FreeBSD. Some planned work includes: enabling the use of ZFS as the base filesystem for fuzzer VMs; extending the range of system calls and ioctls covered by syzkaller; enabling LLVM sanitizers in the kernel so as to catch more issues; and making use of netdump(4) to capture kernel dumps for panics found by syzkaller, making it much easier to diagnose bugs for which syzkaller was unable to find a reproducible test case. This project was sponsored by The FreeBSD Foundation. __________________________________________________________________ Kernel ZLIB Update Links Review D19706 URL: https://reviews.freebsd.org/D19706 Contact: Yoshihiro Ota <ota_at_j.email.ne.jp> Kernel zlib upgrade is in progress. Xin (delphij_at_) and I have been working closely for zlib upgrade. We relocated contrib/zlib to sys/contrib/zlib in order for kernel code to access zlib in the tree. We also deleted dead code that depended on zlib and inflate - inflate is a fork of unzip to uncompress gzip files. We also renamed crc.h to avoid conflicts with zlib/crc.h. Next goal is to compile both old zlib and new zlib into the kernel allowing to switch each zlib user independently. __________________________________________________________________ Linux compatibility layer update Contact: Edward Tomasz Napierala <trasz_at_FreeBSD.org> The project aims to improve the Linux compatibility layer, to make it more compatible with recent Linux releases, and also to lower the bar for potential developers who want to start contributing to it. The initial effort focused on tooling, to make it easier to debug problems and to prevent future regressions. The first part involved making it possible to use Linux strace(1) utility and providing it as linux-c7-strace package. The reason is that while FreeBSD truss(1) and ktrace(1) can trace Linux binaries, they cannot decode Linux-specific flags and structures. The second part involved providing Linux Test Project binaries as linux-ltp package. There is ongoing work to hook it up to the FreeBSD CI infrastructure http://ci.FreeBSD.org. There was also a number of improvements and fixes to bugs discovered in the process. One of them (not yet committed) fixes binaries linked against newer version of libc, effectively unbreaking binaries from recent Ubuntu releases. This project was sponsored by FreeBSD Foundation. __________________________________________________________________ Lock-less delayed invalidation for amd64 pmap Contact: Konstantin Belousov <kib_at_freebsd.org> The Virtual Memory machine-dependent layer (pmap) on amd64 needs to track all mappings for the managed physical memory pages, to be able to either destroy all of them (for page-out), or change them from writeable to read-only (e.g. to sync the page content to file, without racing with modifications through user writes). The mappings are accounted by creating pv_entry which records the address space (implicitly, by linking the pv entry to pmap) and the virtual address of the mapping. Previous work split the lock protecting the pv entries lists from other VM locks into the pvh_global_lock lock, which was global for all address spaces. You can see it in i386 pmap.c still. Later, hashed per-page pv lists locks were introduced, which would reduce contention on pv lists maninulations for different pages, but unfortunately the pvh_global_lock was still needed to guarantee the safety of some operations. Problem arises because amd64 pmap uses pmap lock to protect page tables and TLB consistency, which is per-pmap locks different from pv lists locks. When updating page table entry, we never drop pmap lock until the necessary TLB invalidation is done globally, including signalling other CPUs with IPI. But pv list locks can be unlocked before the necessary invalidation is done. So for instance when pmap is asked to remove all mappings of the specific page (pmap_remove_all(9)), it checks pv list of the page to find the mappings. The list might appear empty despite other CPUs TLB were not yet invalidated. If such page is reused, other CPUs might change its content using cached TLB entries. Allowing that means allowing both silent data corruption and opening security hole. So the global pvh lock was held until all pmaps invalidated their TLBs. This mechanism has obvious scalability issues, and instead a generation-count based scheme for handling delayed invalidation (DI) was developed, where each thread that might remove entry from pv list acquired a generation number and marked the page with it, see pmap_delayed_invl_page(9). Then, on e.g. pmap_remove_all(9) or pmap_remove_write(9), pmap code waits for the maximum current thread's invalidation generation number to pass the page's generation, which guarantees that all required TLB invalidations are done. Original implementation of DI allowed to get rid of pvh_global_lock, and only used a private mutex to handle sequential queueing of the coming and leaving threads, protecting a bounded region. A problem with that appeared e.g. in scalability benchmarks which did massive parallel unmaps, causing most of the threads to contend on DI queueing. Current implementation of DI switched to lock-less queue algorithm using the approach proposed by T.L. Harris and relying on double-CAS to coalesce generation count and queueing. It uses ifuncs to select either previous locked DI or current lock-less implementation, only old AMD Athlons which did not implemented the CMPXCHG16B instruction falls to the locked implementation by default. Lock-less implementation still blocks the waiting thread on turnstile to avoid priority-inversion issues, but practically the wait occur very rare, typical parallel buildworld generates single-digit number of the events. The patch got a lot of testing from Peter Holm, continuous reviews by Mark Johnston while I worked out bugs and live-lock problems in the implementation, and additional testing by Mateusz Guzik who helped to identify a priority inversion bug with the wait. This project was sponsored by The FreeBSD Foundation. __________________________________________________________________ Locking changes for vnodes during execve(2) Contact: Konstantin Belousov <kib_at_freebsd.org> The execve(2) family of syscalls replaces the executing image in the current process. The file containing the program text, data, and arbitrary other pre-initialized segments for the newly activated image is usually called the text file. FreeBSD marks the text file as such, the mark is mutually exclusive with any opening of the file for write. In other words, file opened for write cannot be executed, and text file cannot be opened for write. During the execve(2) syscall processing, kernel needs to lock the text file' vnode. This is done both to satisfy the VFS calls protocol, and to ensure that there is no incompatible parallel changes occuring to the text vnode. A vnode can be locked either in exclusive mode, which is mutually incompatible with any other lock acquisition, or in shared mode, which is only incompatible with exclusive requests, but allows other shared owners. In principle, there is no reason why would execve(2) need an exclusive vnode lock, since it does not modify neither content nor metadata for the text vnode. The only exception is the marking of the vnode as text, which was done using VV_TEXT flag in v_vflag and protected by the vnode lock. Since we modify v_vflag, the vnode lock protecting the modification should be taken exclusive. The end result is that execve(2)'s of the same file are serialized. For instance, if user runs parallel build, which executes more than one job for compiling, all invocation of the compiler are serialized during execve(2). The count of opens for write is contained in other struct vnode member named v_writecount, which was protected by the vnode lock as well. Since text is mutually exclusive with an open for write, I reused v_writecount to indicate text references. Now, negative v_writecount counts the number of text references. The v_writecount content is literally protected by the vnode interlock, but normally all mutators also own vnode lock at least in the shared mode. This way, we no longer need to acquire exclusive text vnode lock during execve(2), removing the serializing point. Additional positive effect is that we started to account the precise number of text references on the vnode. Before, we cleared VV_TEXT on the last unmap of the text vnode, potentially allowing obscure DoS where mapping the text file while it is executed prevented writes until the mapping is destroyed. Now we mark the mappings for text explicitly in the vm_map_entry and dereference v_writecount by +1 when such entry is unmapped. This project was sponsored by The FreeBSD Foundation. __________________________________________________________________ Mellanox Drivers Update Links Mellanox OFED for FreeBSD Documentation URL: http://www.mellanox.com/page/products_dyn?product_family=193&mtag=freebsd_driver Contact: Slava Shwartsman Hans Petter Selasky Konstantin Belousov <freebsd-drivers_at_mellanox.com> The mlx5 driver provides support for ConnectX-4 [Lx], ConnectX-5 [Ex] and ConnectX-6 [Dx] adapter cards. The mlx5en driver provides support for Ethernet adapter cards, whereas mlx5ib driver provides support for InfiniBand adapters and RDMA over Converged Ethernet (RoCE). Following updates done in mlx5 drivers: * 200Gb/s ConnectX-6 Ethernet: Added support for Mellanox Socket Direct Adapters which allows, among the rest of the capabilities, to run up to 200Gb/s on a PCIe Gen 3.0 on a LAG interface. * Support for "BlueField" - Multicore System On A Chip: Added support for RShim driver for BlueField Multicore System On A Chip(SOC). The RShim driver provides access to the RShim resources on the BlueField target accessible from an external host machine. The current RShim version provides device files for boot image push and virtual console access. It also creates virtual network interface to connect to the BlueField target and provides access to internal RShim registers. * Firmware Burning and Diagnostics Tools: Added MSTFLINT to ports, this package contains a burning and diagnostic tools for Mellanox NICs. This package contains following tools: mstflint - Tools which allows to query and burn firmware. mstconfig - This tool queries and sets non-volatile configurable options for Mellanox HCAs. mstregdump - This utility dumps hardware registers from Mellanox hardware. mstmcra - This debug utility reads/writes a to/from the device configuration register space. mstvpd - This utility dumps the on-card VPD. and more. * OFED-FreeBSD-v3.5.1 Upstream: Pushed upstream and MFCed OFED-FreeBSD-v3.5.1 driver - more details on the content of this update can be found in Mellanox OFED for FreeBSD documentation page. General updates: * Submitted papers for EuroBSDcon for a joint talk with Netflix titled "Kernel TLS and TLS Hardware Offload". The papers were accepted. * Mellanox is intensively working to improve its cooperation with the FreeBSD community. As part of this effort, FreeBSD users are invited to propose features and enhancements to further develop and enrich the end-user experience. In addition, Mellanox continues to identify and present the right solutions to meet customers' needs. This project was sponsored by Mellanox Technologies. __________________________________________________________________ NFSv4.2 client/server implementation for FreeBSD Links current sources URL: https://svnweb.freebsd.org/base/projects/nfsv42 Contact: Rick Macklem <rmacklem_at_freebsd.org> NFSv4.2 is a newer minor version of NFSv4, made up of a set of optional operations/features. A majority of these operations are related to the POSIX operations posix_fadvise(2), posix_fallocate(2) and lseek(2)'s support for SEEKHOLE/SEEKDATA. There is also a Copy operation that allows a byte range of a file to be copied to another file locally on the NFS server, avoiding data transfer over the wire in both directions. FreeBSD-current now has a Linux compatible copy_file_range(2) syscall that will invoke this Copy operation on NFSv4.2 mounts. There is also support for MAC labelling, but it requires changes to the RPCSEC_GSS implementation to add V3 support and, as such, may not happen soon. The implementation of NFSv4.2 (RFC-7862) is progressing nicely. At this time, the LayoutError, IOAdvise, Allocate and Copy operations have been implemented. There is still work to be done on Copy, to add asynchronous support, so that large copies do not result in a long delay for the RPC's reply. The major operation that will be implemented next is Seek, so that lseek(SEEKHOLE/SEEKDATA) will work for the NFSv4.2 mounts. It is hoped that this implementation will be ready for FreeBSD-current/head in time for the FreeBSD-13 release. Testing is always appreciated and can be done by downloading the modified kernel from the svn repository in base/rojects/nfsv42 and then building and testing it on a couple of recent FreeBSD-current systems. If anyone is conversant with Kerberos and wants to take on the challenge of adding RPCSEC_GSS_V3 support to the kernel RPC, a patch that does that would also be greatly appreciated. __________________________________________________________________ NUMA awareness in the FreeBSD kernel Contact: Jeff Roberson <jeff_at_FreeBSD.org> Contact: Andrew Gallatin <gallatin_at_FreeBSD.org> Contact: Mark Johnston <markj_at_FreeBSD.org> A set of patches to improve the state of NUMA awareness in the FreeBSD kernel are being developed and refined. This work also aims to generally improve the performance of FreeBSD's memory management subsystem on systems with many CPUs. FreeBSD 12.0 featured a number of large changes which improve its performance on systems with a non-uniform memory architecture. That is, systems in which memory access latency for a given address varies depending on the CPU. Another round of improvements is being developed and will soon be available in FreeBSD-CURRENT. Short descriptions of some of these patches follow; a few have already been committed to FreeBSD-CURRENT. In FreeBSD terminology, a memory page whose contents may not be evicted is referred to as "wired." Pages may be wired under different circumstances: for instance, all kernel memory is wired, and userland applications may request that ranges of memory be wired using the mlock(2) and mlockall(2) system calls. FreeBSD has historically defined a system-wide limit on the number of wired pages so as to avoid deadlocks that may arise when too much of a system's memory cannot be reclaimed to satisfy new memory allocations. This limit was applied only to userland wiring requests, but kernel wirings were counted against the limit, so a large source of kernel wirings could cause mlock(2) failures. This occurs frequently with a large ZFS ARC, for example. In FreeBSD-CURRENT this limit has been changed such that only userland wirings are counted against the limit; the kernel contains a number of mechanisms to apply back-pressure to kernel memory usage, so the use of a global limit on all wirings did not provide much benefit. This fixes a common problem on large ZFS systems, and helps enable some other architectural improvements to the code which manages page wirings. FreeBSD has historically maintained two separate reference counters in the structure which describes a single physical page of memory. These counters initially had quite different properties, but have over time become more and more similar. Some work to merge the two counters has landed in FreeBSD-CURRENT. This does not have any user-visible effects, but it simplifies the page management code and removes a large amount of code which existed solely to transform references of one type to the other. Such code also made use of heavily contended locks, so the simplification improved kernel scalability for some workloads and has enabled further scalability improvements. UMA is the slab allocator used in FreeBSD's kernel. It is the backend which services virtually all dynamic memory allocations performed in the kernel. The first round of NUMA improvements added NUMA awareness to the "keg" layer of UMA, which allocates and manages slabs. However, the frontend of UMA, which provides several layers of caching for objects, did not provide domain-aware caching, so over time the caches would become "polluted" with objects from different memory domains. However, this caching layer is being modified to ensure that objects from different memory domains are partitioned, helping ensure that consumers can perform domain-local allocations and frees efficiently. This will enable a global "first-touch" allocation policy for UMA-managed objects. During boot, the FreeBSD kernel allocates a number of static data structures to track physical memory. These structures have historically lived in the lowest available range of physical memory, so they many not inhabit the same NUMA domain as the memory that they track. This is suboptimal when one tries to affinitize a workload to a particular NUMA domain: if while executing the workload the kernel frequently accesses page structures for local memory, and the page structures themselves are not placed in local memory, the kernel will perform many remote memory accesses. Some in-progress work for the amd64 platform creates multiple arrays of page tracking structures, one per NUMA domain, and ensures that each array is local to its domain. This complicates the task of initializing kernel data structures during boot, but can substantially reduce the amount of cross-domain communication that occurs while the kernel is performing useful work. Similarly, some patches to affinitize per-CPU structures are being developed; while most per-CPU memory allocations already return CPU-local memory, some structures allocated during boot are not yet properly placed with respect to the accessing CPU's memory domain. This project was sponsored by Netflix. __________________________________________________________________ Architectures Updating platform-specific features and bringing in support for new hardware platforms. Broadcom ARM64 SoC support Contact: Michal Stanek <mst_at_semihalf.com> Contact: Kornel Duleba <mindal_at_semihalf.com> Contact: Marcin Wojtas <mw_at_semihalf.com> The Semihalf team continued working on FreeBSD support for the Broadcom BCM5871X SoC series BCM5871X are quad-core 64-bit ARMv8 Cortex-A57 communication processors targeted for networking applications such as 10G routers, gateways, control plane processing and NAS. Completed since the last update: * iProc PCIe root complex (internal and external buses): fixes and improvements, including adding a BCM58712 quirk to GICv2m driver * BNXT Ethernet support: sys/dev/bnxt.c driver has been extended to support the BCM58700 variant, and the iflib was made to work without IO cache coherency In progress: * Crypto engine acceleration for IPsec offloading. Todo: * Upstreaming of work. This work is expected to be submitted/merged to HEAD in the second half of 2019. This project was sponsored by Juniper Networks, Inc. __________________________________________________________________ NXP ARM64 SoC support Contact: Marcin Wojtas <mw_at_semihalf.com> Contact: Artur Rojek <ar_at_semihalf.com> The Semihalf team initiated working on FreeBSD support for the NXP LS1046A SoC LS1046A are quad-core 64-bit ARMv8 Cortex-A72 processors with integrated packet processing acceleration and high speed peripherals including 10 Gb Ethernet, PCIe 3.0, SATA 3.0 and USB 3.0 for a wide range of networking, storage, security and industrial applications. Already completed: * Platform base support (ramp-up multi-user SMP operation with UART) * SATA 3.0 In progress: * USB3.0 * SD/MMC * I2C Todo: * Ethernet support * GPIO * QSPI * Upstreaming of developed features. This work is expected to be submitted/merged to HEAD in the Q4 of 2019. This project was sponsored by Alstom Group. __________________________________________________________________ Third-Party Projects Many projects build upon FreeBSD or incorporate components of FreeBSD into their project. As these projects may be of interest to the broader FreeBSD community, we sometimes include brief updates submitted by these projects in our quarterly report. The FreeBSD project makes no representation as to the accuracy or veracity of any claims in these submissions. Aberdeen Hackathon At BSDCam in Cambridge last year we had a discussion to create a template Hackathon in the same way we have a template for Devsummits. To test out the idea I was convinced (I swear tricked is the correct word) to host a Hackathon in Aberdeen. As a project I think we benefit a lot from hackathons, but they do take a little organisation. The worst part of this is dealing with getting money from attendees so you can pay for events. I spoke with Deb Goodkin from the foundation at BSDCam and we arranged to use their new EventBrite based system to handle ticketing. Overall this system made it straight forward for attendees to register and get me their details and requirements. After the event the expenses were then recouped from the foundation. This was much easier than me putting together a custom system or even setting up and using EventBrite myself. The hackathon went well, you can read in Benedict and Kristof's reports that follow, but it was less well attended than I originally expected. For hackers planning future hackathons remember to take heed of common national holidays (we could have planned the event to not land at Easter) and expect major geopolitical events to make things unpredictable (we knew Brexit would do something, but not when). I need to thank the University of Aberdeen for providing the location for the Hackathon and to encourage you to run a hackathon where you are. The next one should be in your home town. Benedict Reuschling The hackathon in Aberdeen was happening in the week of Easter at the University of Aberdeen. Although only Kristof Provost (kp_at_) and myself joined our host Tom Jones, I still consider it a productive week for us. The overall theme of the hackathon was networking and each of us provided something towards that goal (be it PRs, submitting unfinished work, or other bits and pieces). We got together the night of Tuesday, April 16 over dinner and talked about what our plans were for the week. Kristof and I had talked at AsiaBSDcon when I took his tutorial about Testing in FreeBSD that we should add a chapter about it in the developers handbook. We also used our first meeting to synchronize each other about the latest news in FreeBSD from our developers viewpoint. The next day, we met up at the Frazer Noble building where the hackathon was taking place. It was one of the newer buildings on campus, nicely integrated into the older houses of the city. Since we were only a handful, we sat in Tom's office for the hackathon, which had plenty of room. He also showed us the room where we are supposed to be having the hackathon if we were more people and Tom gave us a little tour. Working in a university myself, I'm always interested in how other education organizations are structured and the rooms and equipment they provide for learning. Overall, my impression was that there is a good amount of space and equipment available, which we could have used in the hackathon. After returning, we decided to use a special tag in the commits we would be doing to identify them as coming from this hackathon. We chose "Event:" for it as it is a general enough term to be used at other events like conferences, too. The "Sponsored by:" line we used in the past is more for companies or individuals sponsoring certain features, so I created a review to add this line to the committers guide. Kristof had a couple of changes to the pf chapter in the FreeBSD handbook for me, so I started going through those. I created a review for him and the commit was made there and then, making use of the short feedback cycle. Originally, we thought about bringing in people via hangouts, but then resolved to contact people via our usual IRC channel if we needed their input. Kristof and Tom worked on some network specific stuff, whereas I started work on creating an initial draft for the testing chapter. We would occasionally start talking about something and then return to our work in silence. If we needed to coordinate or had questions, we simply asked and could continue once we got our answer. This provided a nice atmosphere to work in. I tackled some doc PRs while Kristof found a bug in pf and fixed it. The afternoons were spent at different locations within walking distance. Tom made sure we got a good impression on how it is to be a student and that there is both taste and variety of food available. In the evenings, Tom drove us into town to have dinner at various restaurants over the week. Aberdeen has a lot to offer as a city. Starting from the second day, Kristof and I would meet up at my hotel, which was close to the Aberdeen beach and walk along it to the University. According to Tom, it is possible to see Dolphins when the weather is right and the gulf stream provides the city with enough warmth that the winters aren't as bad as you'd think this far up north. Tom also gave us a tour of the zoological department of the university, which offered a beautiful garden with various plants and trees, as well as a museum with zoological specimen. This offered a great spot for photographs and to unwind a bit from the technical discussions we've had. Tom also had t-shirts made for the event, which are already rare collectors items. I had to return on Sunday, so Tom took us on a tour of the Scottish highlands in his car the day before. We stopped at a couple of places to take pictures and Tom would explain at lot to us having lived there all his life. We came to Stonehaven and had fish and chips there from a take-out restaurant that had a lot of awards for sustainable fishing. This was certainly a highlight for the week and even then, we couldn't stop talking about FreeBSD and networking. Although more people would maybe have produced more output, the three of us were certainly productive as a small group. It also made planning and coordination easier and more flexible. Tom Jones had done a lot of preparation and was an excellent guide. I would encourage him to host another such hackathon in the future and hope that next time, more people will take a trip to Aberdeen to spend some time hacking on FreeBSD Kristof Provost While I'd been to Scotland before I'd never seen Aberdeen. It's a charming city, and I enthusiastically recommend visiting. I arrived a little while after Benedict, but made it to my hotel easily, and turned up in time to join Benedict and Tom for dinner. Despite being small (or perhaps because of it), the hackathon was remarkably productive. Benedict and I went through the pf documentation in the handbook, so that Benedict could rework and improve it. (Benedict's doing the work, but I'm going to take credit anyway.) Tom and I looked at the GSoC proposals and tried to find potential mentors for two promising proposals. Both of us are candidate mentors as well. We should know soon if our students are awarded slots. Tom also proposed a patch to eliminate RFC 2675 IPv6 Jumbograms. It has my enthusiastic support. I managed to look at a couple of open pf issues: * pfctl's interface_group() function checks if a name is an interface or an interface group. It still thought that interface names always ended with a number, but this assumption has been wrong for several years now. That's fixed in r346370. * The DIOCRSETTFLAGS ioctl() misused copyin() (It held a lock calling it), which could result in panics. * That previous issue was actually discovered by my local instance of syzcaller, which I'd set up to add pf support to it. That support has now been merged, so we may see more issues detected by syzcaller soon. * Also for the DIOCRSETTFLAGS problem I extended the pf tests to check for this issue. * The pf tests will now fail if the pft_set_rules call fails to set the rules. That didn't actually cause issues yet, but it'll make debugging tests slightly easier, and they may catch more problems now. On Saturday Tom took us out to discover some of the pretty bits of Scotland. It turns out there are a lot of them. I can't really do it justice, but Tom has a promising career at the Scottish tourism board when this computers fad blows over. On my way home I passed through Oslo, and took the opportunity to meet with (have lunch with) two of the EuroBSDCon local organisers. EuroBSDCon is filling up fast, make sure to register now to secure your place! __________________________________________________________________ Bring more Security Intelligence to FreeBSD Links Maltrail - distributed Malware detection URL: https://github.com/stamparm/maltrail Wazuh - thread detection and incident response URL: https://wazuh.com/ Contact: Michael Muenz <m.muenz_at_gmail.com> To bring more Security Intelligence we maintain the FreeBSD port of zmaltail. This open source project based on Python can act as a sensor and/or as a central server. It listens in defined ports or protocols and compares IP addresses and domains against static and dynamic feeds, contributed by the community. As you can install this piece of software on multiple firewalls and let them send to a central server, you are able to detect attacks and compromises very fast. Within Q2 we updated the port to the latest version and are constantly in contact with the core developer (also co-author of SQLmap) to bring out new features. The second project we are currently trying to add as a port is Wazuh. Wazuh is a fork of Ossec which is already in the ports tree. Compared to Ossec, Wazuh has some intelligent addition like full ELK-Stack integration with own apps and dashboards. With Wazuh installed on your webserver, or even on your windows desktop you can monitor file integrity or log files for most kind of attacks. Active response features let you e.g. send API calls to your firewalls to dynamically block this offender. As Wazuh offers a complete ELK-Stack you can use it also as a central logging solution for better security insights into your network. This project was sponsored by m.a.x. Informationstechnologie AG. __________________________________________________________________ libvdsk - QCOW2 implementation Links Github - libvdsk repo URL: https://github.com/xcllnt/libvdsk Contact: Sergiu Weisz <sergiu121_at_gmail.com> Contact: Marcel Molenaar <marcel_at_freebsd.org> Contact: Marcelo Araujo <araujo_at_freebsd.org> Contact: Mihai Carabas <mihai_at_freebsd.org> Add support for using QCOW in bhyve using the libvdsk library. Libvdsk was used to substitute the regular disk operations from bhyve with a call to libvdsk which will in turn call the disk-specific handler for the operation. To use this feature one has to install the libvdsk-enabled bhyve version along with libvdsk from the libvdsk repo linked above. New features added: * Extend libvdsk to make it easier to implement new formats * Improve read/write performance and stability * Add support for Copy-On-Write Future tasks: * Integrate libvdsk in bhyve Matthew Grooms __________________________________________________________________ nsysctl 1.0 Links gitlab.com/alfix/nsysctl URL: https://gitlab.com/alfix/nsysctl sysutils/nsysctl port URL: https://www.freshports.org/sysutils/nsysctl/ Tutorial URL: https://alfix.gitlab.io/bsd/2019/02/19/nsysctl-tutorial.html Contact: Alfonso Sabato Siciliano <alfonso.siciliano_at_email.com> The nsysctl utility is a /sbin/sysctl clone, to get or set the kernel state, supporting libxo and extra options. nsysctl [--libxo=opts [-r tagname]] [-DdFGgIilmNpqTt[V|v[h[b|o|x]]]Wy] [-e sep] [-B <bufsize>] [-f filename] name[=value[,value]] ... nsysctl [--libxo=opts [-r tagname]] [-DdFGgIlmNpqTt[V|v[h[b|o|x]]]Wy] [-e sep] [-B <bufsize>] -A|a|X You could use nsysctl to explore the sysctl MIB showing the value and the info of an object. The output is explicitly indicated by the options and is printed via libxo in human and machine readable formats, moreover some value is parsed to display it in a structured mode (e.g., vm.phys_free). The support for efi_map_header was added but it is untested, someone could help by trying it via machdep.efi_map. Please refer to the tutorial for a more thorough description. __________________________________________________________________
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:21 UTC