Re: CUREENT issue with ballon.c

From: Roger Pau Monné <roger.pau_at_citrix.com>
Date: Mon, 28 Oct 2013 11:41:20 +0100
On 25/10/13 00:24, Outback Dingo wrote:
> On Thu, Oct 24, 2013 at 6:17 PM, Roger Pau Monné <roger.pau_at_citrix.com>wrote:
> 
>> On 24/10/13 22:15, Konstantin Belousov wrote:
>>> On Thu, Oct 24, 2013 at 09:45:20PM +0100, Roger Pau Monn? wrote:
>>>> On 24/10/13 13:01, Outback Dingo wrote:
>>>>>
>>>>>
>>>>> On Thu, Oct 24, 2013 at 6:16 AM, Roger Pau Monn? <roger.pau_at_citrix.com
>>>>> <mailto:roger.pau_at_citrix.com>> wrote:
>>>>>
>>>>>     On 24/10/13 03:02, Outback Dingo wrote:
>>>>>     > --- trap 0, rip = 0, rsp = 0xfffffe00002c6b70, rbp = 0 ---
>>>>>     > uma_zalloc_arg: zone "16" with the following non-sleepable locks
>> held:
>>>>>     > exclusive sleep mutex balloon_lock (balloon_lock) r = 0
>>>>>     > (0xffffffff816e9c58) locked _at_
>>>>>     /usr/src/sys/dev/xen/balloon/balloon.c:339
>>>>>     > exclusive sleep mutex balloon_mutex (balloon_mutex) r = 0
>>>>>     > (0xffffffff816e9c38) locked _at_
>>>>>     /usr/src/sys/dev/xen/balloon/balloon.c:373
>>>>>     > KDB: stack backtrace:
>>>>>     > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>>>     > 0xfffffe00002c67c0
>>>>>     > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00002c6870
>>>>>     > witness_warn() at witness_warn+0x4a8/frame 0xfffffe00002c6930
>>>>>     > uma_zalloc_arg() at uma_zalloc_arg+0x3b/frame 0xfffffe00002c69a0
>>>>>     > malloc() at malloc+0x101/frame 0xfffffe00002c69f0
>>>>>     > balloon_process() at balloon_process+0x44a/frame
>> 0xfffffe00002c6a70
>>>>>     > fork_exit() at fork_exit+0x84/frame 0xfffffe00002c6ab0
>>>>>     > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00002c6ab0
>>>>>     > --- trap 0, rip = 0, rsp = 0xfffffe00002c6b70, rbp = 0 ---
>>>>>     > uma_zalloc_arg: zone "16" with the following non-sleepable locks
>> held:
>>>>>     > exclusive sleep mutex balloon_lock (balloon_lock) r = 0
>>>>>     > (0xffffffff816e9c58) locked _at_
>>>>>     /usr/src/sys/dev/xen/balloon/balloon.c:339
>>>>>     > exclusive sleep mutex balloon_mutex (balloon_mutex) r = 0
>>>>>     > (0xffffffff816e9c38) locked _at_
>>>>>     /usr/src/sys/dev/xen/balloon/balloon.c:373
>>>>>     > KDB: stack backtrace:
>>>>>     > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>>>>     > 0xfffffe00002c67c0
>>>>>     > kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe00002c6870
>>>>>     > witness_warn() at witness_warn+0x4a8/frame 0xfffffe00002c6930
>>>>>     > uma_zalloc_arg() at uma_zalloc_arg+0x3b/frame 0xfffffe00002c69a0
>>>>>     > malloc() at malloc+0x101/frame 0xfffffe00002c69f0
>>>>>     > balloon_process() at balloon_process+0x44a/frame
>> 0xfffffe00002c6a70
>>>>>     > fork_exit() at fork_exit+0x84/frame 0xfffffe00002c6ab0
>>>>>     > fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00002c6ab0
>>>>>     > --- trap 0, rip = 0, rsp = 0xfffffe00002c6b70, rbp = 0 ---
>>>>>     > uma_zalloc_arg: zone "16" with the following non-sleepable locks
>> held:
>>>>>
>>>>>     Did you do anything specific to trigger the crash? Can you explain
>> the
>>>>>     steps needed to reproduce it?
>>>>>
>>>>>
>>>>> just recompiled a kernel, and booted it scrolls continuously across the
>>>>> screen
>>>>> doesnt seem to ever stop.
>>>>
>>>> I've tried r257051 and it seems to work fine, could you please post your
>>>> Xen version, the config file used to launch the VM and the toolstack
>> used?
>>>
>>> Do you have witness enabled in your kernel config ?
>>
>> Yes, but I'm not touching balloon memory target.
>>
>>> There is an obvious case of calling malloc(M_WAITOK) while holding both
>>> balloon_lock and balloon_mutex:
>>> ballon_process->decrease_reservation->balloon_append.
>>
>> Yes, I'm aware of that, it's just that it shouldn't happen unless you
>> actually trigger a balloon memory decrease, which should not happen
>> automatically AFAIK, that's why I was asking if this was happening
>> without the user specifically requesting it.
>>
>> Anyway, this should be clearly fixed and pulled into 10 no matter what
>> triggered it. I will send a patch as soon as possible.
>>
>>
> Yes, WITNESS was enabled, im using Kubuntu / XEN kernel / and
> virt-manager.... it was fine running Current, until i ran updates this week
> then encountered this. Ive since disabled WITNESS with a recompile, but the
> VM still apears more sluggish then before.
> 
> root_at_M14xR2:/home/dingo# xm info
> host                   : M14xR2
> release                : 3.11.0-12-generic
> version                : #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013
> machine                : x86_64
> nr_cpus                : 4
> nr_nodes               : 1
> cores_per_socket       : 2
> threads_per_core       : 2
> cpu_mhz                : 2494
> xen:/// capabilities:
> <capabilities>
> 
>   <host>
>     <cpu>
>       <arch>x86_64</arch>
>       <features>
>         <pae/>
>       </features>
>     </cpu>
>     <power_management>
>       <suspend_mem/>
>       <suspend_disk/>
>       <suspend_hybrid/>
>     </power_management>
>     <migration_features>
>       <live/>
>       <uri_transports>
>         <uri_transport>xenmigr</uri_transport>
>       </uri_transports>
>     </migration_features>
>   </host>
> 
>   <guest>
>     <os_type>xen</os_type>
>     <arch name='x86_64'>
>       <wordsize>64</wordsize>
>       <emulator>qemu-dm</emulator>
>       <machine>xenpv</machine>
>       <domain type='xen'>
>       </domain>
>     </arch>
>   </guest>
> 
>   <guest>
>     <os_type>xen</os_type>
>     <arch name='i686'>
>       <wordsize>32</wordsize>
>       <emulator>qemu-dm</emulator>
>       <machine>xenpv</machine>
>       <domain type='xen'>
>       </domain>
>     </arch>
>     <features>
>       <pae/>
>     </features>
>   </guest>
> 
>   <guest>
>     <os_type>hvm</os_type>
>     <arch name='i686'>
>       <wordsize>32</wordsize>
>       <emulator>qemu-dm</emulator>
>       <loader>hvmloader</loader>
>       <machine>xenfv</machine>
>       <domain type='xen'>
>       </domain>
>     </arch>
>     <features>
>       <pae/>
>       <nonpae/>
>       <acpi default='on' toggle='yes'/>
>       <apic default='on' toggle='no'/>
>       <hap default='off' toggle='yes'/>
>       <viridian default='off' toggle='yes'/>
>     </features>
>   </guest>
> 
>   <guest>
>     <os_type>hvm</os_type>
>     <arch name='x86_64'>
>       <wordsize>64</wordsize>
>       <emulator>qemu-dm</emulator>
>       <loader>hvmloader</loader>
>       <machine>xenfv</machine>
>       <domain type='xen'>
>       </domain>
>     </arch>
>     <features>
>       <acpi default='on' toggle='yes'/>
>       <apic default='on' toggle='no'/>
>       <hap default='off' toggle='yes'/>
>       <viridian default='off' toggle='yes'/>
>     </features>
>   </guest>
> 
> </capabilities>
> 
> 
> 
> hw_caps                :
> bfebfbff:28100800:00000000:00007f00:77bae3bf:00000000:00000001:00000281
> virt_caps              : hvm
> total_memory           : 8074
> free_memory            : 12
> free_cpus              : 0
> xen_major              : 4
> xen_minor              : 3
> xen_extra              : .0
> xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p hvm-3.0-x86_32
> hvm-3.0-x86_32p hvm-3.0-x86_64
> xen_scheduler          : credit
> xen_pagesize           : 4096
> platform_params        : virt_start=0xffff800000000000
> xen_changeset          :
> xen_commandline        : placeholder
> cc_compiler            : gcc (Ubuntu/Linaro 4.8.1-10ubuntu5) 4.8.1
> cc_compile_by          : stefan.bader
> cc_compile_domain      : canonical.com
> cc_compile_date        : Wed Oct  2 11:17:12 UTC 2013
> xend_config_format     : 4
> root_at_M14xR2:/home/dingo# uname -a
> Linux M14xR2 3.11.0-12-generic #19-Ubuntu SMP Wed Oct 9 16:20:46 UTC 2013
> x86_64 x86_64 x86_64 GNU/Linux

I've never used libvirt, so I'm unsure why is it triggering a balloon
update without the user requesting it (maybe too many guests running on
the same host?).

Anyway, the attached patch should fix the issues with the balloon
driver, it contains some clean up of the code, the fixes to avoid
calling malloc with M_WAITOK while holding the mutex and also a fix for
properly accounting for memory. Right now if we reduce memory to 900MB
for example FreeBSD keeps using more memory, because physmem doesn't
account for the memory where the kernel is mapped or the MP boot stacks.

Could you please try it and report the results?


Received on Mon Oct 28 2013 - 09:41:25 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:43 UTC