Re: svn commit: r352558 - head/usr.bin/top

From: Mark Millard <marklmi_at_yahoo.com>
Date: Fri, 10 Jul 2020 16:44:00 -0700
On 2020-Jul-10, at 15:25, Mark Millard <marklmi at yahoo.com> wrote:

> On 2020-Jul-10, at 11:05, Yuri Pankov <yuripv at yuripv.dev> wrote:
> 
>> Steve Wills wrote:
>>> On 11/28/19 4:08 PM, Mark Millard via svn-src-head wrote:
>>>>> Author: daichi
>>>>> Date: Fri Sep 20 17:37:23 2019
>>>>> New Revision: 352558
>>>>> URL:
>>>>> https://svnweb.freebsd.org/changeset/base/352558
>>>>> 
>>>>> 
>>>>> Log:
>>>>>   top(1): support multibyte characters in command names (ARGV array)
>>>>>   depending on locale.
>>>>>    - add setlocale()
>>>>>    - remove printable() function
>>>>>    - add VIS_OCTAL and VIS_SAFE to the flag of strvisx() to display
>>>>>      non-printable characters that do not use C-style backslash sequences
>>>>>      in three digit octal sequence, or remove it
>>>>>   This change allows multibyte characters to be displayed according to
>>>>>   locale. If it is recognized as a non-display character according to the
>>>>>   locale, it is displayed in three digit octal sequence.
>>>>> 
>>>> 
>>>> Initially picking on tab characters as an example of what is
>>>> probably a somewhat broader issue . . .
>>>> 
>>>> Ever since this change, characters like tabs that do not fit
>>>> in the next character cell when output, but for which they
>>>> are !isprintable(...), now mess up the top display. Again
>>>> using tab as an example: line wrapping from the text having
>>>> been shifted over by more than one character cell. top does
>>>> not track the line wrapping result in how it decides what
>>>> to output for the following display updates.
>>>> 
>>> Apologies for the way late reply here, but I just now bothered tracking this down. This commit seems to be the cause of some corruption I'm seeing in long running top(1) as well. As Mark mentions, if I use "hh" it clears up. Should I open a bugzilla bug? I can share screenshots of the corruption, such as:
>>> https://i.imgur.com/Xqlwf9h.png
>>> https://i.imgur.com/Jv0d5NU.png
>> 
>> Does removing VIS_SAFE fixes the issue for you?
>> 
>> As for original Mark's report (which I missed), removing isprintable() doesn't look wrong as vis(3) should take of its functionality (and in multibyte-aware way).
> 
> vis (as used) and the old isprintable logic are not
> equivalent when multi-byte is not needed/involved.
> Otherwise I'd not have had anything to ever report.
> If vis can do what is needed, more work needed to
> be done when the change was made in order to avoid
> msesed up displays in single-byte contexts.
> 
>> Also, is there an easy way to reproduce this?
> 
> The following sort of command (the empty space inside quoted
> text are tab characters):
> 
> # tr '0\n      1\n     2\n     3\n     4\n     5\n     6\n     7\n     8\n' '\t0       \t1     \t2     \t3     \t4     \t5     \t6     \t7     \t8' < /dev/zero > /dev/null
> 
> causes my 200 character wide window running top to show:
> 
> 32920 root        100    0  12764Ki    2420Ki CPU3     3   2:22  99.87% tr 0\\n	1\\n	2\\n	3\\n	4\\n	5\\n	6\\n	7\\n	8\\n \\t0	\\t1	\\t2	\\t3	\\t4	\\t5	\\t6	\\t733   \\t8       20        7172      5448Ki CPU23   23   0:00 0.04% top -HiSCazopid
> 
> But that does not show where the lines wrap at the edges of the window,
> so breaking it up explicitly after the first "\" in \\7:
> 
> 32920 root        100    0  12764Ki    2420Ki CPU3     3   2:22  99.87% tr 0\\n	1\\n	2\\n	3\\n	4\\n	5\\n	6\\n	7\\n	8\\n \\t0	\\t1	\\t2	\\t3	\\t4	\\t5	\\t6	\
> \t733   \\t8       20        7172      5448Ki CPU23   23   0:00   0.04% top -HiSCazopid
> 
> Note how \n turned into \\n , taking an extra character for
> each \n . Similarly for \t vs. \\t . (Other examples do
> similarly.)
> 
> The tab characters really do use more than one character cell
> on the display (sometimes).
> 
> The text from the tr command ends up spread across 2 lines
> as things look like in the window where top is running.
> 
> I ran top in another ssh session first and then the tr command.
> Before running the tr command, top showed as:
> 
> 33019 root         20    0  17172Ki    5448Ki CPU24   24   0:00   0.05% top -HiSCazopid
> 
> If you do not end up with top listed just after tr in top's output,
> then it will not be top's line that ends up partially overwritten.
> 
> If you have wider windows, you may need more text in the tr quoted
> strings.
> 
> In another experiment I inserted a large number of backspace characters
> (control-H's) at the front of the first quoted string in the tr command.
> The top output displayed:
> 
> 0\\n5 ro1\\n    2\\93   3\\n12764\\n   25\\ni CP6\\n   97\\n:12 100.00\t0r \nHiS\\t1pid \\t2	\\t3	\\t4	\\t5	\\t6	\\t
> 33094 root         20    0  17172Ki    5488Ki CPU21   21   0:00   0.06% top -HiSCazopid
> 
> In other words, backspace moved the cursor position back over prior
> fields on the line and then the later line content overwrote those
> fields instead of being after "tr" someplace (or truncated off).
> 
> Note that part of "-HiSCazopid" shows up on both lines. The extra
> is from when top was running but tr had not started yet. top is
> not managing text replacement correctly for output characters that
> end up not being just "in" the next character-cell on the terminal.
> 
> The same sort of result happens when instead adding just one
> carriage return (control-M) in front of that first quuoted
> string instead:
> 
> 0\\n8 ro1\\n    2\\92   3\\n12764\\n   25\\ni CP6\\n  117\\n:11 100.00\t0r \nHiS\\t1pid \\t2	\\t3	\\t4	\\t5	\\t6	\\t
> 33094 root         20    0  17172Ki    5488Ki CPU23   23   0:00   0.04% top -HiSCazopid
> 
> I do not intend to try to find all examples of characters that
> cause problems but used to not cause problems.
> 
> From what I've seen, cursor positioning escape character sequences
> seem to be sent through and cause overwrites at arbitrary places
> on screen, based on the escape sequence content. There are command
> lines around that contain such sequences. So I sometimes see the
> first few lines of top's output have garbage text from commands
> that were listed below at some point overwriting the top text.
> 
> Part of what is going on is top avoiding rewriting characters
> that its tracking indicates have not been updated. When the
> actual display and that supposed-tracking mismatch, the
> display ends up wrong when updated (bad text continues to
> display).
> 
> The text in commands should not make "top -a" output mess up
> the display of other lines in top's output, nor of other
> top output fields on the same line. In my view, if some usage
> contexts need otherwise, it should take an extra command line
> option to put top in a mode that might do such things. The
> default behavior should strictly avoid having such things
> happen.

I accidentally had a ^M instead of a ^H (backspace) at
one position in my backspace example. Removing that
carriage return produced:

33345 root         98    0  12764Ki  0\\n20Ki CP1\\n   82\\n:20  99.99% 4\\n-HiS5\\npid 6\\n	7\\n	8\\n \\t0	\\t1	\\t2	\\t3	\\t4	\\t5	\\t6	\\t7	\
33094 root         20    0  17172Ki    5620Ki CPU24   24   0:02   0.04% top -HiSCazopid

And I tried:

# tr '\a\b\c\d\e\f\g\h\i\j\k\l\m\n\o\p\q\r\s\t\u\v\w\x\y\z\0\1\2\3\4\5\6\7\8\9' '\9\8\7\6\5\4\3\2\1\0\q\w\e\r\t\y\u\i\o\pa\s\d\f\g\h\j\jk\l\z\x\c\v\b\n\m' < /dev/zero > /dev/null

This seems to have worked as desired: truncating the tar -a output for
the tr process the end of the tar line, with: \\7\\6\\5

No other lines were messed up.

I also tried just control-K (Vertical Tab) sequences and it resulted in:

33417 root        103    0  12764Ki    2420Ki CPU10   10   1:03  99.96% tr \v\v\v\v\v\v\v\v\v \v\v\v\v\v\v\v\v\v

Similarly control-L (Form Feed) characters resulted in:

33429 root        102    0  12764Ki    2420Ki CPU12   12   0:31  99.99% tr \f\f\f\f\f\f\f\f \f\f\f\f\f\f\f

Similarly DEL characters resulted in:

33456 root         93    0  12764Ki    2420Ki CPU15   15   0:12  99.99% tr \177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177\177 \177\177\177\177\17

(truncating at the end of top display line).

QUOTE
    VIS_SAFE	 Only encode "unsafe" characters.  Unsafe means	control	char-
		 acters	which may cause	common terminals to perform unexpected
		 functions.  Currently this form allows	space, tab, newline,
		 backspace, bell, and return --	in addition to all graphic
		 characters -- unencoded.
END QUOTE

If VIS_SAFE is avoided, it might be that space would end up encoded instead
of unencoded. that would not be good.

But it does seem that encoding or replacing tab, newline, backspace, bell,
and return would be more appropriate than what now happens.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
Received on Fri Jul 10 2020 - 21:44:10 UTC

This archive was generated by hypermail 2.4.0 : Wed May 19 2021 - 11:41:24 UTC