monitoring mpt raid arrays

From: Oliver Schonrock <oliver_at_realtsp.com>
Date: Wed, 12 Mar 2008 19:28:31 +0000
Hi

We have a Dell SC1435 with a SAS 5i raid controller (which is an OEM LSI 
Logic SAS1064, supported under FreeBSD by the mpt driver). The array 
works just fine, but monitoring the array is also important and there 
seems to be no support under mpt(4) right now:

http://nico.schottelius.org/documentations/freebsd/freebsd-raid-monitoring

I snooped around in the source code and found these snippets

mpt_raid.c

const char *
mpt_disk_state(struct mpt_raid_disk *disk)
{
         switch (disk->config_page.PhysDiskStatus.State) {
         case MPI_PHYSDISK0_STATUS_ONLINE:
                 return ("Online");
         case MPI_PHYSDISK0_STATUS_MISSING:
                 return ("Missing");
         case MPI_PHYSDISK0_STATUS_NOT_COMPATIBLE:
                 return ("Incompatible");
         case MPI_PHYSDISK0_STATUS_FAILED:
                 return ("Failed");
         case MPI_PHYSDISK0_STATUS_INITIALIZING:
                 return ("Initializing");
         case MPI_PHYSDISK0_STATUS_OFFLINE_REQUESTED:
                 return ("Offline Requested");
         case MPI_PHYSDISK0_STATUS_FAILED_REQUESTED:
                 return ("Failed per Host Request");
         case MPI_PHYSDISK0_STATUS_OTHER_OFFLINE:
                 return ("Offline");
         default:
                 return ("Unknown");
         }
}

/*
  * Update in-core information about RAID support.  We update any entries
  * that didn't previously exists or have been marked as needing to
  * be updated by our event handler.  Interesting changes are displayed
  * to the console.
  */
int
mpt_refresh_raid_data(struct mpt_softc *mpt)
{


.....

mpt_disk_prt(mpt, mpt_disk, "%s\n", mpt_disk_state(mpt_disk));

....

Which looks to me like the raid controller/driver would report when 
things go wrong and how it is dealing with it etc. The messages printed 
by the driver make it into dmesg output.

Slight Aside:
-------------
We had a problem with:

         case MPI_EVENT_QUEUE_FULL:
         {
                 struct cam_sim *sim;
                 struct cam_path *tmppath;
                 struct ccb_relsim crs;
                 PTR_EVENT_DATA_QUEUE_FULL pqf =
                     (PTR_EVENT_DATA_QUEUE_FULL) msg->Data;
                 lun_id_t lun_id;

                 mpt_prt(mpt, "QUEUE FULL EVENT: Bus 0x%02x Target 
0x%02x Depth "
                     "%d\n", pqf->Bus, pqf->TargetID, pqf->CurrentDepth);


which we "fixed" by writing an rc script to run camcontrol like this:
http://www.zulustips.com/2007/09/06/mpt0-queue-full-event-on-dell-sas-5ir.html

(not sure what this actually does, but it works...)

Anyway these QUEUE FULL EVENT message were appearing in dmesg output.

So, my very simplistic question is:

While there is no mpt cli management interface to query the state of the 
raid array, can I just write a cron driven script which checks dmesg 
output every 10min, say, and notifies the administrator if it finds any 
messages from mpt (some simple grepping and diff'ing against dmesg.boot, 
should be able to keep this quiet unless there is really something to 
report).

Will this work? Is it a reasonable "work around" for the missing raid 
monitoring support mpt arrays?

Thanks in advance.


Oliver Schonrock
Received on Wed Mar 12 2008 - 18:47:37 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC