Don Bowman wrote: > From: Kris Kennaway [mailto:kris_at_obsecurity.org] > >>On Wed, Mar 24, 2004 at 03:23:36PM -0500, Don Bowman wrote: >> >> >>>>Right, I think that's not the cause of your lockup :) >>> >>>Not being one to believe in coincidences... I'm typing >>>on the serial console. The machine halts, i can no longer type. >>>some seconds pass, out pops that message. This time too it >>>returned. Most times (when i run two postgresql vacuums >> >>simulatenously >> >>>for example), that's the end of it. >>> >>>I will continue to investigate. >> >>Check for disk problems..I have often experienced hangs or lockups on >>machines with faulty disks. > > > 6-disk raid 5 behind ASR. All disks report optimal, controller > reports optimal. I know the hangs you mean, from the vm > swapin etc which holds all the locks. I don't think this > is they. > > with ahd i would get scsi sense errors in the log for machines > with problems [CRC errors etc], i don't have a for what asr does > in this case. > > ran a 96 hour memory test (memtest86), with ecc checking, there > were no soft or hard errors. Ran machine to 40 degrees C ambient > in environmental chamber, its all good. Its got 3 power supplies, > all are operational, fed from UPS. > This is a software problem somewhere I think. > > I'm curious, how many people use ASR with current? It seems > like it might be somewhat unloved. > It is unloved. Adaptec provides no official support for it, and I have many more things that are a higher priority. I'm not against working on it, but it's hard to justify it at the moment. Anyways, it wouldn't surprise me if the controller or driver was going out to lunch and stalling the VM, but we probably need to do a lot more investigation to support that. I assume that you have both WITNESS and INVARIANTS turned on? ScottReceived on Wed Mar 24 2004 - 13:00:13 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:37:48 UTC