Re: My problems with stability on -current

From: Doug Barton <dougb_at_FreeBSD.org>
Date: Wed, 11 May 2011 23:19:25 -0700
On 05/11/2011 04:33, Alexander Motin wrote:
> On 11.05.2011 08:17, Doug Barton wrote:
>> I had an interesting result doing nothing but switching from HPET to
>> LAPIC ... no crash. Still on the same version of -current (r221566) the
>> only thing I've done is to add kern.eventtimer.timer="LAPIC" to
>> /boot/loader.conf, and so far I haven't been able to get it to crash no
>> matter how much I compile, or how much other stuff I do in the
>> background. I _can_ get the system heavily loaded enough so that the
>> mouse can drag across the screen, windows take visible time to repaint,
>> etc. That happens with a load average of 4+ on this core 2 duo. But
>> other than that (which is not altogether unreasonable) the system has
>> been very stable for a couple of days now.
>>
>> Does that suggest a next step in terms of what to test?
>
> The fact that LAPIC is working fine can mean that problem is either HPET
> specific or non-per-CPU timers specific. To check that you could try to
> use i8254 timer in one-shot mode:
> hint.attimer.0.timecounter=0
> kern.eventtimer.timer="i8254"
>
> , or use HPET in per-CPU mode:
> hint.atrtc.0.clock=0
> hint.attimer.0.clock=0
> hint.hpet.X.legacy_route=1
>
> But the most informative would be to see what's going on with HPET
> interrupts during the freezes. With HPET hardware it is very easy to
> loose interrupt. And the lost interrupt means problem for many things.
> There are some workarounds made for that, but I can't be sure. For that
> case you could experiment with this patch:
> --- acpi_hpet.c.prev 2010-12-25 11:28:45.000000000 +0200
> +++ acpi_hpet.c 2011-05-11 14:30:59.000000000 +0300
> _at__at_ -190,7 +190,7 _at__at_ restart:
> bus_write_4(sc->mem_res, HPET_TIMER_COMPARATOR(t->num),
> t->next);
> }
> - if (fdiv < 5000) {
> + if (1 || fdiv < 5000) {
> bus_read_4(sc->mem_res, HPET_TIMER_COMPARATOR(t->num));
> now = bus_read_4(sc->mem_res, HPET_MAIN_COUNTER);

Ok, I'll try the patch sometime soon, lots going on right now. FYI, I 
had something odd happen tonight, the laptop had been up for about 36 
hours, and it was idle for a while when I was afk for about an hour. 
When I came back, the system was off. Nothing in the logs, no core dump, 
but it definitely crashed because when I turned it back on the file 
systems were all dirty. This is still r221566 running LAPIC.

Interestingly I had pidgin running while it was idle, and a friend sent 
me an e-mail saying that he tried to IM me and as soon as he sent the 
message my status went from "away" to "off line." The time he sent the 
e-mail corresponds roughly to the last entry in the log before I 
rebooted it. I realize that this is not a lot to go on, but I thought 
I'd mention it.


Doug

-- 

	Nothin' ever doesn't change, but nothin' changes much.
			-- OK Go

	Breadth of IT experience, and depth of knowledge in the DNS.
	Yours for the right price.  :)  http://SupersetSolutions.com/
Received on Thu May 12 2011 - 04:19:27 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:40:14 UTC