Re: monitoring mpt raid arrays

From: John Baldwin <jhb_at_freebsd.org>
Date: Wed, 12 Mar 2008 16:03:33 -0400
On Wednesday 12 March 2008 03:28:31 pm Oliver Schonrock wrote:
> Hi
> 
> We have a Dell SC1435 with a SAS 5i raid controller (which is an OEM LSI 
> Logic SAS1064, supported under FreeBSD by the mpt driver). The array 
> works just fine, but monitoring the array is also important and there 
> seems to be no support under mpt(4) right now:
> 
> http://nico.schottelius.org/documentations/freebsd/freebsd-raid-monitoring
> 
> I snooped around in the source code and found these snippets
> 
> mpt_raid.c
> 
> const char *
> mpt_disk_state(struct mpt_raid_disk *disk)
> {
>          switch (disk->config_page.PhysDiskStatus.State) {
>          case MPI_PHYSDISK0_STATUS_ONLINE:
>                  return ("Online");
>          case MPI_PHYSDISK0_STATUS_MISSING:
>                  return ("Missing");
>          case MPI_PHYSDISK0_STATUS_NOT_COMPATIBLE:
>                  return ("Incompatible");
>          case MPI_PHYSDISK0_STATUS_FAILED:
>                  return ("Failed");
>          case MPI_PHYSDISK0_STATUS_INITIALIZING:
>                  return ("Initializing");
>          case MPI_PHYSDISK0_STATUS_OFFLINE_REQUESTED:
>                  return ("Offline Requested");
>          case MPI_PHYSDISK0_STATUS_FAILED_REQUESTED:
>                  return ("Failed per Host Request");
>          case MPI_PHYSDISK0_STATUS_OTHER_OFFLINE:
>                  return ("Offline");
>          default:
>                  return ("Unknown");
>          }
> }
> 
> /*
>   * Update in-core information about RAID support.  We update any entries
>   * that didn't previously exists or have been marked as needing to
>   * be updated by our event handler.  Interesting changes are displayed
>   * to the console.
>   */
> int
> mpt_refresh_raid_data(struct mpt_softc *mpt)
> {
> 
> 
> .....
> 
> mpt_disk_prt(mpt, mpt_disk, "%s\n", mpt_disk_state(mpt_disk));
> 
> ....
> 
> Which looks to me like the raid controller/driver would report when 
> things go wrong and how it is dealing with it etc. The messages printed 
> by the driver make it into dmesg output.
> 
> Slight Aside:
> -------------
> We had a problem with:
> 
>          case MPI_EVENT_QUEUE_FULL:
>          {
>                  struct cam_sim *sim;
>                  struct cam_path *tmppath;
>                  struct ccb_relsim crs;
>                  PTR_EVENT_DATA_QUEUE_FULL pqf =
>                      (PTR_EVENT_DATA_QUEUE_FULL) msg->Data;
>                  lun_id_t lun_id;
> 
>                  mpt_prt(mpt, "QUEUE FULL EVENT: Bus 0x%02x Target 
> 0x%02x Depth "
>                      "%d\n", pqf->Bus, pqf->TargetID, pqf->CurrentDepth);
> 
> 
> which we "fixed" by writing an rc script to run camcontrol like this:
> 
http://www.zulustips.com/2007/09/06/mpt0-queue-full-event-on-dell-sas-5ir.html
> 
> (not sure what this actually does, but it works...)
> 
> Anyway these QUEUE FULL EVENT message were appearing in dmesg output.
> 
> So, my very simplistic question is:
> 
> While there is no mpt cli management interface to query the state of the 
> raid array, can I just write a cron driven script which checks dmesg 
> output every 10min, say, and notifies the administrator if it finds any 
> messages from mpt (some simple grepping and diff'ing against dmesg.boot, 
> should be able to keep this quiet unless there is really something to 
> report).
> 
> Will this work? Is it a reasonable "work around" for the missing raid 
> monitoring support mpt arrays?
> 
> Thanks in advance.

The problem is that on your box the mpt_raid stuff isn't working because it 
probes for the RAID metadata too early, so you don't get any of the raid 
messages.

-- 
John Baldwin
Received on Wed Mar 12 2008 - 19:29:19 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:39:28 UTC