Hello! I'm trying to find why the new ATA DMA dump code in CURRENT fails under some conditions. My conditions IMHO are very common: I issue cd /usr/ports/editors/openoffice.org-2.0 NOCLEANDEPENDS=yes make extract clean (just to create and then delete a LOT of files) on ASUS M5A notebook with "only" 256Mb of RAM. This reliably panics my system during the clean pass, when softupdates code runs into the shortage of kmem_map: panic: kmem_malloc(4096): kmem_map too small: 82014208 total allocated Before trying to understand how to tune my system better (alas tuning(7) doesn't mention kmem_map at all) I'm trying to obtain crash dump, but I'm just getting infamous "FAILURE - out of memory in start" error in ad_strategy. OK, it's very unwise to rely on availability of kernel memory in situations like mine. But we can easily guard against it by preallocating a spare "struct ata_request". I've created a simple patch: ftp://external.atlantis.dp.ua/FreeBSD/CURRENT/nodump/ata-disk.c.patch wich solves this allocation problem and instruments code in order to understand code flow. Note that it's unclear to me _what_ guarantees that ad_strategy() will always finish it's job, so I've added a check for BIO_DONE. Actually once I've got this check failed, and my system was just keeping print '.''s (request has never been finished). But the most serious problem is that in more than 90% of cases I don't even come to printf("}"); ! I'm just getting another "panic: double fault" instead. Look at the pictures DSCN1971-4 in the same folder as patch. On the 1st picture you can see that panic happens during the execution of ad_strategy() (there is a "{" w/o matching "}"). On the 2nd you see the start of 'bt' output. I've no idea about trap 0x17 - is it stack overflow or something else? On the 3rd you can see what that main part of the stack is filled with: repetitive sequence of nested ata_start() ata_interrupt() ata_finish() ata_completed() 4th picture is the point where initial ad_dump() takes place. My theory is that ata driver tries to finish off all queued I/O requests and is running out of the stack. And the question here is whether driver should try to complete those previously queued requests at all: OS has just crashed, so data (and disk block numbers!) in those request can be invalid. My main question is whether dump speed increase worth the loss of dump robustness? I think it's not. Alas, this new dump code has already been commited to RELENG_6, so IMHO we should try to fix this issue before ongoing 6.1-RELEASE. Impossibility to obtain a crash dump can make developer's life really difficult. IMHO we should try to make the new code robust (so it won't fail in the case of OS resource shortages), but if we fail the good old (slow but always working) dump code should be restored. Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: dmitry_at_atlantis.dp.ua nic-hdl: LYNX-RIPEReceived on Sat Feb 25 2006 - 13:44:17 UTC
This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:38:52 UTC