Re: Accessing SCSI-Devices >2TB

From: Kenneth D. Merry <ken_at_freebsd.org>
Date: Sat, 11 Jun 2005 21:54:40 -0600
On Sun, Jun 12, 2005 at 00:25:08 +0200, Raphael H. Becker wrote:
> On Fri, Jun 10, 2005 at 09:07:18AM -0600, Kenneth D. Merry wrote:
> 
> > > and here the relevant diffs:
> > > http://rabe.uugrn.org/temp/FreeBSD/bigraid/dmesg.knoppix_diff.txt
> > 
> > This is quite interesting:
> [....]
> > Linux notices that the device returned 0xffffffff as the capacity in
> > response to a READ CAPACITY(10) command, so it tries a READ CAPACITY(16)
> > command, which *fails*.
> > 
> > So even under Linux you aren't getting the full capacity of your device,
> > you're only getting 2TB.
> 
> The support told me, SuSE Linux is known to work with >2TB in one device,
> means they might have some patches to work around. I will try a SuSE
> live system next days just to get sure it works. But the System won't
> be SuSE in future.

It would be interesting to see whether that works.  That would help narrow
down the problem slightly.

> > > Second I rebooted FreeBSD with CAMDEBUG in kernel and enabled it via
> > > "camcontrol debug ..." and did a "camcontrol rescan 1" then:
> > > http://rabe.uugrn.org/temp/FreeBSD/bigraid/freebsd54_camdebug.txt
> > 
> > camcontrol debug -I isn't quite what we need in this situation.  Instead,
> > you should try 'camcontrol debug -c'.
> 
> # camcontrol debug -c 1:0
> # camcontrol rescan 1
> Re-scan of bus 1 was successful
> 
> in /var/log/messages:
> 
> kernel: (probe0:ahc1:0:0:0): TEST UNIT READY.  CDB: 0 0 0 0 0 0
> kernel: (probe0:ahc1:0:0:0): INQUIRY. CDB: 12 0 0 0 24 0
> kernel: (probe0:ahc1:0:0:0): INQUIRY. CDB: 12 0 0 0 fc 0
> kernel: (probe0:ahc1:0:0:0): MODE SENSE(06).  CDB: 1a 0 a 0 14 0
> kernel: (probe0:ahc1:0:0:0): INQUIRY. CDB: 12 1 80 0 ff 0
> kernel: (probe0:ahc1:0:0:1): INQUIRY. CDB: 12 20 0 0 24 0
> kernel: (probe0:ahc1:0:0:2): INQUIRY. CDB: 12 40 0 0 24 0
> kernel: (probe0:ahc1:0:0:3): INQUIRY. CDB: 12 60 0 0 24 0
> kernel: (probe0:ahc1:0:0:4): INQUIRY. CDB: 12 80 0 0 24 0
> kernel: (probe0:ahc1:0:0:5): INQUIRY. CDB: 12 a0 0 0 24 0
> kernel: (probe0:ahc1:0:0:6): INQUIRY. CDB: 12 c0 0 0 24 0
> kernel: (probe0:ahc1:0:0:7): INQUIRY. CDB: 12 e0 0 0 24 0
> 
> Does not say anything to me.

Hmm, well, you're not going to see the problem CDB that way, because the
probe has already happened.  To see it, you either need to compile in the
debugging flags, or do the following:

- unplug the cable from the machine to the RAID array
- camcontrol rescan 1
- plug the cable back in
- camcontrol rescan 1

> > > Any idea, whats wrong with it?
> > 
> > >From what I can see, it's likely the device is misbehaving.  The fact that
> > the 16 byte read capacity fails under Linux is telling.  If you've got a
> > device that supports a LUN size greater than 2TB, it must support the 16
> > byte read capacity and read/write commands.
> 
> So you would say this is a misbehaviour of the RAID's firmware/controller?

It's either the RAID box or the ahc driver from what I can see at this
point.  See below.

> > Here are some more things you can try.  Does your system boot?  
> Well, that RAID is just one of 3 RAIDs, the system is on the internal PERC-RAID.
> 
> > If so, we
> > can try sending a few commands to the device via the pass(4) driver and see
> > what happens.
>  
> > First, run 'camcontrol devlist' and see if the array is there and whether
> > there is a pass device attached.  If so, try this:
> > 
> > camcontrol cmd passX -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4"
> 
> <IFT A12U-G2421 342D>              at scbus1 target 0 lun 0 (pass3)
> # camcontrol cmd pass3 -v -c "25 0 0 0 0 0 0 0 0 0" -i 8 "i4 i4"
> -1 512 

Okay, that's good.  The -1 means that the RAID box is telling us that we
need to send the 16 byte read capacity command to get the true capacity.
(That's what a capacity of 0xffffffff, or -1, means.)

> > That will send a standard 10 byte read capacity command to the device.
> > Next, try a 16 byte read capacity.  This is where things are likely failing
> > in the da(4) driver attach, and apparantly where things are failing under
> > Linux:
> > 
> > camcontrol cmd passX -v -c "9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0" -i 12 "i4 i4 i4"
> 
> # camcontrol cmd pass3 -v -c "9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0" -i 12 "i4 i4 i4"
> camcontrol: error sending command
> (pass3:ahc1:0:0:0): SERVICE ACTION IN(16). CDB: 9e 10 0 0 0 0 0 0 0 0 0 0 0 c 0 0 
> (pass3:ahc1:0:0:0): CAM Status: Target Bus Phase Sequence Failure
> 
> dmesg:
> (pass3:ahc1:0:0:0): No or incomplete CDB sent to device.
> (pass3:ahc1:0:0:0): Protocol violation in Message-in phase.  Attempting to abort.
> (pass3:ahc1:0:0:0): Abort Tag Message Sent
> (pass3:ahc1:0:0:0): SCB 8 - Abort Tag Completed.

Hmm, okay, at this point, we have a SCSI protocol violation.  (Which is the
same thing you saw before.)

So this pretty much means it is the 16 byte read capacity that is
triggering the problem.

The question is, is this problem on the RAID box or in the ahc driver?  I
would lean towards saying the RAID box has the issue, but Justin (CCed) may
be able to give a little more insight.

> > If that works, there is some other problem.  If it fails, then we're
> > fairly close to the problem.
> 
> So, if it's a problem with the RAIDs firmware and/or maybe hardware,
> do you expect there's a workaround in FreeBSD for it? 

It's either a problem with the firmware on the RAID controller or with the
ahc driver.  If it turns out that the RAID controller is at fault, then
you'll need to get fixed firmware.

It'll be interesting to see whether a SuSE live system works with it.  (And
reports a capacity that is greater than 2TB.)

Ken
-- 
Kenneth Merry
ken_at_FreeBSD.ORG
Received on Sun Jun 12 2005 - 01:54:46 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:36 UTC