Visualizing whitespace characters

i18n
Tags: #<Tag:0x00007faa9f88fa30>

#1

In code listings, text input examples, text editors and similar it can be important to display whitespace characters unambiguously. There are a number of Unicode characters available, e.g. centered dot · or open box (&blank;), some are even specifically intended for this purpose in one context or the other, e.g. space control symbol and blank input symbol . Tabulators and line breaks are often shown as arrows / and ↩︎ / / .

As far as I know, there currently is no CSS way to display alternate glyphs for whitespace. Assuming it should be possible, which existing or new property (and hence module) would be the right place?

pre, code, textarea.showinvisibles {
    white-space: pre-visible;
    font-variant-alternates: visible-controls; /* high level, assumes support by the font (format) */
    font-feature-settings: "vwsp" 1; /* low level, assumes support by the font (format) */
    text-transform: visible-whitespace;
}

#2

Hmmmm. My first instinct is text-transform, but that doesn’t actually work well - the tab still needs to be tab-sized, the linebreak still needs to break the line, etc.

font-variant probably works, but that does rely on fonts actually exposing such things. Is that supported by any font right now?

Assuming fonts do support it, we should still be able to synthesize them for fonts that don’t (or fallback to some built-in that has them, same deal).


#3

Simple replacement maps would probably sufficient, e.g.:

.example {
    substrings: replace(" " "\t", "␣" "→→→→") prepend("\n", "↵");
}

where replace() function replaces given substrings with other substrings (" " to “␣” and “\t” (tab) to 4 “→” characters in the example), while prepend() function adds substrings before corresponding substrings (adds “↵” before each line break in the example).

Same mechanism could probably work for other presentation-only text-processing purposes like trimming leading and trailing whitespace inside inline elements or removing substrings or entire text content of an element (e.g. for text-to-image replacement), etc.

While there should be no problem to do replacements, a possible issue with prepending something before line breaks is that it may result in line overflow if the line does not have sufficient empty space at the end for the prepended string. This would probably need some way to enable positioning of the prepended substring relative to the original substring, e.g.:

.example {
    substrings: prepend("\n", "↵", 0);
}

where 0 works somewhat like left: 0 for absolutely positioned elements, so the prepended substring would be positioned above (by z-axis) original substring and would not affect line length.


#4

That reminds me of @text-transform.


#5

I don’t know any font that supports something like this and I’m not sure which Open Type feature would be appropriate, hence the fake vwsp one above. I share your concerns re text-transform, but that doesn’t say it’s impossible.


#6

All those details are precisely why I don’t think text-transform (or equivalents) is the correct way to go. Doing this well actually has lots of fiddly little details that can’t be done reasonably by just substituting characters - this is fundamentally a display setting.

@Crissov: Okay, so if there’s no OpenType feature, then I’d propose we have it as a new font-* property. font-invisibles: normal | show;? The UA would generate the “invisible characters” for you and display them appropriately. And then, if OpenType later adds some way for fonts to provide their own invisibles glyphs, we can just have this tie into that, similar to how we currently have various features that can either be synthesized by the UA (bold, italic, small caps) or provided by the font.


#7

But if a font-… property was introduced, the next thing authors would ask for was to be able to choose the actual glyphs and maybe also the color. I doubt whitespace would be worth a designated pseudo-element, though.


#8

Yeah, I totally believe they’d ask for it. And if we want that to happen, fonts should provide those characters somehow. :slight_smile:


#9

PrinceXML has implemented the prince-text-replace property, which is list of pair substitutions. We use it extensively to add some typographic sophistication to punctuation—adding thin spaces around em-dashes, adding hair spaces between consecutive quote marks, etc.

prince-text-replace: "’”" "’\200A”" "—" "\200B—\200B"


#10

I would like to note that CSS Text 3 actual states to render control characters with their visual representations. This is implemented in MS Edge and FF behind a flag currently. This currently excludes the white space you’re looking for however :confused:

Any specific reason why /n /r /t can’t work in these scenarios though?


#11

Indeed, here’s the relevant quote from the current draft:

Control characters (Unicode category Cc) other than tab (U+0009), line feed (U+000A), form feed (U+000C), and carriage return (U+000D) must be rendered as a visible glyph and otherwise treated as any other character of the Other Symbols (So) general category and Common script. The UA may use a glyph provided by a font specifically for the control character, substitute the glyphs provided for the corresponding symbol in the Control Pictures block, generate a visual representation of its codepoint value, or use some other method to provide an appropriate visible glyph. As required by [UNICODE], unsupported Default_ignorable characters must be ignored for rendering.