Re: svn commit: r352558 - head/usr.bin/top

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 10 Jul 2020 15:25:06 -0700
On 2020-Jul-10, at 11:05, Yuri Pankov <yuripv at yuripv.dev> wrote:

> Steve Wills wrote:
>> On 11/28/19 4:08 PM, Mark Millard via svn-src-head wrote:
>>>> Author: daichi
>>>> Date: Fri Sep 20 17:37:23 2019
>>>> New Revision: 352558
>>>> URL:
>>>> https://svnweb.freebsd.org/changeset/base/352558
>>>> 
>>>> 
>>>> Log:
>>>>    top(1): support multibyte characters in command names (ARGV array)
>>>>    depending on locale.
>>>>     - add setlocale()
>>>>     - remove printable() function
>>>>     - add VIS_OCTAL and VIS_SAFE to the flag of strvisx() to display
>>>>       non-printable characters that do not use C-style backslash sequences
>>>>       in three digit octal sequence, or remove it
>>>>    This change allows multibyte characters to be displayed according to
>>>>    locale. If it is recognized as a non-display character according to the
>>>>    locale, it is displayed in three digit octal sequence.
>>>> 
>>> 
>>> Initially picking on tab characters as an example of what is
>>> probably a somewhat broader issue . . .
>>> 
>>> Ever since this change, characters like tabs that do not fit
>>> in the next character cell when output, but for which they
>>> are !isprintable(...), now mess up the top display. Again
>>> using tab as an example: line wrapping from the text having
>>> been shifted over by more than one character cell. top does
>>> not track the line wrapping result in how it decides what
>>> to output for the following display updates.
>>> 
>> Apologies for the way late reply here, but I just now bothered tracking this down. This commit seems to be the cause of some corruption I'm seeing in long running top(1) as well. As Mark mentions, if I use "hh" it clears up. Should I open a bugzilla bug? I can share screenshots of the corruption, such as:
>> https://i.imgur.com/Xqlwf9h.png
>> https://i.imgur.com/Jv0d5NU.png
> 
> Does removing VIS_SAFE fixes the issue for you?
> 
> As for original Mark's report (which I missed), removing isprintable() doesn't look wrong as vis(3) should take of its functionality (and in multibyte-aware way).

vis (as used) and the old isprintable logic are not
equivalent when multi-byte is not needed/involved.
Otherwise I'd not have had anything to ever report.
If vis can do what is needed, more work needed to
be done when the change was made in order to avoid
msesed up displays in single-byte contexts.

> Also, is there an easy way to reproduce this?

The following sort of command (the empty space inside quoted
text are tab characters):

# tr '0\n      1\n     2\n     3\n     4\n     5\n     6\n     7\n     8\n' '\t0       \t1     \t2     \t3     \t4     \t5     \t6     \t7     \t8' < /dev/zero > /dev/null

causes my 200 character wide window running top to show:

32920 root        100    0  12764Ki    2420Ki CPU3     3   2:22  99.87% tr 0\\n	1\\n	2\\n	3\\n	4\\n	5\\n	6\\n	7\\n	8\\n \\t0	\\t1	\\t2	\\t3	\\t4	\\t5	\\t6	\\t733   \\t8       20        7172      5448Ki CPU23   23   0:00   0.04% top -HiSCazopid

But that does not show where the lines wrap at the edges of the window,
so breaking it up explicitly after the first "\" in \\7:

32920 root        100    0  12764Ki    2420Ki CPU3     3   2:22  99.87% tr 0\\n	1\\n	2\\n	3\\n	4\\n	5\\n	6\\n	7\\n	8\\n \\t0	\\t1	\\t2	\\t3	\\t4	\\t5	\\t6	\
\t733   \\t8       20        7172      5448Ki CPU23   23   0:00   0.04% top -HiSCazopid

Note how \n turned into \\n , taking an extra character for
each \n . Similarly for \t vs. \\t . (Other examples do
similarly.)

The tab characters really do use more than one character cell
on the display (sometimes).

The text from the tr command ends up spread across 2 lines
as things look like in the window where top is running.

I ran top in another ssh session first and then the tr command.
Before running the tr command, top showed as:

33019 root         20    0  17172Ki    5448Ki CPU24   24   0:00   0.05% top -HiSCazopid

If you do not end up with top listed just after tr in top's output,
then it will not be top's line that ends up partially overwritten.

If you have wider windows, you may need more text in the tr quoted
strings.

In another experiment I inserted a large number of backspace characters
(control-H's) at the front of the first quoted string in the tr command.
The top output displayed:

0\\n5 ro1\\n    2\\93   3\\n12764\\n   25\\ni CP6\\n   97\\n:12 100.00\t0r \nHiS\\t1pid \\t2	\\t3	\\t4	\\t5	\\t6	\\t
33094 root         20    0  17172Ki    5488Ki CPU21   21   0:00   0.06% top -HiSCazopid

In other words, backspace moved the cursor position back over prior
fields on the line and then the later line content overwrote those
fields instead of being after "tr" someplace (or truncated off).

Note that part of "-HiSCazopid" shows up on both lines. The extra
is from when top was running but tr had not started yet. top is
not managing text replacement correctly for output characters that
end up not being just "in" the next character-cell on the terminal.

The same sort of result happens when instead adding just one
carriage return (control-M) in front of that first quuoted
string instead:

0\\n8 ro1\\n    2\\92   3\\n12764\\n   25\\ni CP6\\n  117\\n:11 100.00\t0r \nHiS\\t1pid \\t2	\\t3	\\t4	\\t5	\\t6	\\t
33094 root         20    0  17172Ki    5488Ki CPU23   23   0:00   0.04% top -HiSCazopid

I do not intend to try to find all examples of characters that
cause problems but used to not cause problems.

From what I've seen, cursor positioning escape character sequences
seem to be sent through and cause overwrites at arbitrary places
on screen, based on the escape sequence content. There are command
lines around that contain such sequences. So I sometimes see the
first few lines of top's output have garbage text from commands
that were listed below at some point overwriting the top text.

Part of what is going on is top avoiding rewriting characters
that its tracking indicates have not been updated. When the
actual display and that supposed-tracking mismatch, the
display ends up wrong when updated (bad text continues to
display).

The text in commands should not make "top -a" output mess up
the display of other lines in top's output, nor of other
top output fields on the same line. In my view, if some usage
contexts need otherwise, it should take an extra command line
option to put top in a mode that might do such things. The
default behavior should strictly avoid having such things
happen.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Fri Jul 10 2020 - 20:25:14 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC