Allowing Browser find-in-page to find non-DOM text

The way I understand the FindText API, it is meant to allow for both use-cases. It goes in length about a “default search algorithm”, but this is meant to be overwritable by the developer (it is an Interface). Indeed, from the introduction:

In dynamic lazy-loaded (or “infinite-scroll”) content, the text being searched for may not have been loaded into the DOM yet. […] To ensure interoperability, this specification describes a default find-text algorithm, but this can be overriden by the developer.

That spec definitely needs more work on the use-cases @jyasskin mentions, but I think it’s the right place for that work to happen.

1 Like

This will be nice for browsers to have. ctrl+f, look for text, then Discourse’s UI (for example) would scroll to the right place even though that text wasn’t in the DOM at all before the search. There’s still lots of logic needed to be written by the developer, but essentially it simply eliminates the actual search UI from needing to be made by the developer. Custom UI or not, the search logic would probably remain largely the same. So when searching in Discourse, the native browser’s search feature would behave the same as Discourse’s magnifying glass button does now. A simple implementation would be that the search terms are simply passed to the Discourse’s search field (but the text input is hidden). Pressing enter could take Discourse to the “show more” page. So, really it’s just a shortcut to use the native search UI as an input source, but other than that most logic would still be in the developers hands when it comes to a site like Discourse.

This will most likely not happen. Content is supposed to be in the DOM, not added through a stylesheet. “Generated content” should really have been called “generated decoration”.

See also these messages/threads on the Protocols and Formats WG list:

I think the issue is not CSS generated content, but sparse DOMs, which developers use for efficiency - whether in the way that React.js does it, or the infinite scroll in twitter etc.

As far as I can tell, the idea of keeping the DOM as small as possible is one that isn’t going away - so being able to find stuff in that situation really means we need to enable this… Luckily, as I understand it that is part of the point of the API proposed.

I’m leaning toward this being a good idea (people put content in it, so that should be searchable). What was said in thread I linked to is that “it’s not meant for content, so we don’t want to encourage it by making it searchable/selectable”. That might be trying to stop a tsunami with a paper towel though…

FindText sounds good to me - I take back what I said about this being something the page should implement, since different user agent use cases have different ways search should be surfaced where the page developer shouldn’t be saddled with the UI for it (eg. screen readers, “find in page” dialogs for mobile vs. desktop).

Yep, the page could get the search string from the search feature of the agent, then use a standardized API to give back the results that tell the agent where the content is and/or how to get that content to appear on-screen (or in-markup for screen readers).

That’s how I understand it too. @jyasskin do you think that’s a reasonable answer? @Shepazu?

Extending the FindText API’s Search Algorithm to allow page authors to override it, and having the browser use that for its own ^F search, would definitely fulfil my request.

I would love this. I have been advocating for this for quite some years now. In fact there are many examples of apps having issues searching over lazily rendered content:

  • ACE code editor, so editing online on GitHub has this problem
  • Discourse
  • Google Docs
  • PDF.JS

Old mailing list posts I made on this topic:

I think reading ideas from there is quite useful.

Yeah, ugh, on this exact subject, Discourse (as @mitar alluded to) apparently implements its own version of “find in page” when you hit Ctrl+F, and it has a noticeable lag relative to Chrome’s, a (pointless) “Search in this thread” checkbox (if I wanted to do anything else, why would I have hit Ctrl+F?), and it rejects search terms that are too short (wanted to find any mentions of the vh unit in a CSS discussion? Too bad!)

I think this actually makes a decent point about what shouldn’t be allowed in such an API, or at least what should be avoided (ie. it should probably be factored to only augment find-in-page search, in such a way that attempting to needlessly redefine in-page content searching feels extremely dirty).

Not to mention that intercepting ctrl-f is not something which really works on mobile, because users are not using the keyboard shortcut to trigger search there.

Yep, part of what I mentioned in my last comment.

Unfortunately, it looks like so far browsers haven’t been interested enough just to see the spec maintained :frowning:

Which means it is likely to come back from Working Group land to WICG, if you want to work on it and try and shop it around.

(I agree that the functionality would actually be very useful for the real web - but it’s somewhere down my personal work priority list :disappointed: )

@marcosc is this something WICG could adopt etc? Was just discussing the API in relation to ‘infinate scrolling’ the HTML spec for perf reasons.

Sorry for not responding sooner. Yes, if someone is willing to drive it - we could do the off-DOM aspects as part of WICG… However, we should integrate into FindText. FindText is tremendously useful and we should probably help Doug Schepers with that spec.

Another option is that we adopt FindText outright - as it looks like it stalled (at least, I’ve never seen it discussed before).

Yes @marcosc I thikn it is stalled :frowning: And I think it would be a useful spec, although it is not in the current WebPlat charter.

But for now it might be most sensible to move it to WICG, assuming the annotations WG agrees - it’s in their charter.

As well as handling endless scrolling search, it would probably be helpful of e.g. editing systems could generate an in-page search rather than reimplementing it, in the case where the text actually is all in the DOM but you want to find e.g. /\<h.\>/ as a way to search for the headings when you’re editing…

I opened this issue:

I’m not convinced this kind of hack-around band-aid is required at this stage. While developers have wrung enormous speed ups from virtualized-list components, I have not seen anyone trying other speed-up approaches, like using drastically reduced rendering formats for off-screen content and swapping into a fully-rendered view on demand.

If the web really needs a feature like this, so be it, but I’d much prefer not diving off the classical contract of a web page being a set of content declared via HTML if we don’t have to. The performance wins of virtualzied components need to be compared versus alternatives we haven’t dived into yet.

That would be an interesting UI, where the browser’s find would scroll to some unstyled text, and then the page would render it while trying to keep the user’s search target on screen.