Allowing Browser find-in-page to find non-DOM text

In applications like Google Docs, or Discourse, off-screen content is removed from the DOM to save memory. This means the browser’s native find-in-page functionality can’t find all the page’s content, which forces these applications to intercept the keyboard shortcut for find-in-page and build their own UI around it.

Similarly, large documents like the HTML standard are split into multiple sub-pages to save memory and loading time, but this sacrifices searchability unless they build their own UI.

We could design an API that would allow these applications to provide their own results to the native find-in-page UI, without needing to replace the UI entirely.

  1. Is this already happening somewhere?
  2. If not, do folks think it’s worthwhile?

I have this exact problem in an app I’m developing - my planned solution is to just hook ctrl+f and implement my own “find in page” functionality. I think any app that needs the complexity of hiding text outside the DOM should also be surfacing its own search control.

Isn’t this exactly what the FindText API aims to achieve?

I believe the FindText API is the other way around: it helps Javascript search DOM text, rather than letting the application help the browser search non-DOM text.

The way I understand the FindText API, it is meant to allow for both use-cases. It goes in length about a “default search algorithm”, but this is meant to be overwritable by the developer (it is an Interface). Indeed, from the introduction:

In dynamic lazy-loaded (or “infinite-scroll”) content, the text being searched for may not have been loaded into the DOM yet. […] To ensure interoperability, this specification describes a default find-text algorithm, but this can be overriden by the developer.

That spec definitely needs more work on the use-cases @jyasskin mentions, but I think it’s the right place for that work to happen.

1 Like

This will be nice for browsers to have. ctrl+f, look for text, then Discourse’s UI (for example) would scroll to the right place even though that text wasn’t in the DOM at all before the search. There’s still lots of logic needed to be written by the developer, but essentially it simply eliminates the actual search UI from needing to be made by the developer. Custom UI or not, the search logic would probably remain largely the same. So when searching in Discourse, the native browser’s search feature would behave the same as Discourse’s magnifying glass button does now. A simple implementation would be that the search terms are simply passed to the Discourse’s search field (but the text input is hidden). Pressing enter could take Discourse to the “show more” page. So, really it’s just a shortcut to use the native search UI as an input source, but other than that most logic would still be in the developers hands when it comes to a site like Discourse.

This will most likely not happen. Content is supposed to be in the DOM, not added through a stylesheet. “Generated content” should really have been called “generated decoration”.

See also these messages/threads on the Protocols and Formats WG list:

I think the issue is not CSS generated content, but sparse DOMs, which developers use for efficiency - whether in the way that React.js does it, or the infinite scroll in twitter etc.

As far as I can tell, the idea of keeping the DOM as small as possible is one that isn’t going away - so being able to find stuff in that situation really means we need to enable this… Luckily, as I understand it that is part of the point of the API proposed.

I’m leaning toward this being a good idea (people put content in it, so that should be searchable). What was said in thread I linked to is that “it’s not meant for content, so we don’t want to encourage it by making it searchable/selectable”. That might be trying to stop a tsunami with a paper towel though…

FindText sounds good to me - I take back what I said about this being something the page should implement, since different user agent use cases have different ways search should be surfaced where the page developer shouldn’t be saddled with the UI for it (eg. screen readers, “find in page” dialogs for mobile vs. desktop).

Yep, the page could get the search string from the search feature of the agent, then use a standardized API to give back the results that tell the agent where the content is and/or how to get that content to appear on-screen (or in-markup for screen readers).

That’s how I understand it too. @jyasskin do you think that’s a reasonable answer? @Shepazu?

Extending the FindText API’s Search Algorithm to allow page authors to override it, and having the browser use that for its own ^F search, would definitely fulfil my request.

I would love this. I have been advocating for this for quite some years now. In fact there are many examples of apps having issues searching over lazily rendered content:

  • ACE code editor, so editing online on GitHub has this problem
  • Discourse
  • Google Docs
  • PDF.JS

Old mailing list posts I made on this topic:

I think reading ideas from there is quite useful.

Yeah, ugh, on this exact subject, Discourse (as @mitar alluded to) apparently implements its own version of “find in page” when you hit Ctrl+F, and it has a noticeable lag relative to Chrome’s, a (pointless) “Search in this thread” checkbox (if I wanted to do anything else, why would I have hit Ctrl+F?), and it rejects search terms that are too short (wanted to find any mentions of the vh unit in a CSS discussion? Too bad!)

I think this actually makes a decent point about what shouldn’t be allowed in such an API, or at least what should be avoided (ie. it should probably be factored to only augment find-in-page search, in such a way that attempting to needlessly redefine in-page content searching feels extremely dirty).

Not to mention that intercepting ctrl-f is not something which really works on mobile, because users are not using the keyboard shortcut to trigger search there.

Yep, part of what I mentioned in my last comment.

Unfortunately, it looks like so far browsers haven’t been interested enough just to see the spec maintained :frowning:

Which means it is likely to come back from Working Group land to WICG, if you want to work on it and try and shop it around.

(I agree that the functionality would actually be very useful for the real web - but it’s somewhere down my personal work priority list :disappointed: )

@marcosc is this something WICG could adopt etc? Was just discussing the API in relation to ‘infinate scrolling’ the HTML spec for perf reasons.

Sorry for not responding sooner. Yes, if someone is willing to drive it - we could do the off-DOM aspects as part of WICG… However, we should integrate into FindText. FindText is tremendously useful and we should probably help Doug Schepers with that spec.

Another option is that we adopt FindText outright - as it looks like it stalled (at least, I’ve never seen it discussed before).