A partial archive of discourse.wicg.io as of Saturday February 24, 2024.

IndexedDB 2.0 performance improvements

nolanlawson
2015-06-28

I’ve mentioned this before to @inexorabletash and @jonathank, but I thought it would be worthwhile to jot down my thoughts in one place.

TLDR: Is getAll() good enough to get top performance out of IndexedDB? Do we also need native joins or putAll()?

There’s been some discussion about bringing better integration with promises or explicit transaction support to IndexedDB. But from the PouchDB perspective, our biggest ask for IndexedDB 2.0 would be neither promises nor transaction control, but rather better bulk APIs to get faster performance. In several performance tests (e.g. this one), we’ve seen WebSQL perform up to 40x faster than IndexedDB for bulk reads in Chrome. Writes are about the same.

The new getAll() is going to help a lot with that (described here, we haven’t started using it yet in PouchDB). But we could possibly leapfrog WebSQL for write speed if we also had a putAll(). (Question: would this actually be any faster than just doing multiple parallel put()s?)

Another big win for PouchDB would be some kind of native “join” with getAll(), since our pagination implementation (allDocs()) currently involves using an IDBCursor and manually joining two object stores, which means a lot of separate requests and therefore a lot of waiting. Whereas in WebSQL, we can just write a single query using JOIN and whatever criteria we want inside of the WHERE. This is a huge win for performance and is largely where that 40x number comes from.

getAll() mitigates this somewhat for the primary object store, but when joining the second object store, we still have to do a bunch of separate get() requests because we need to fetch based on a list of keys. (Or we could sort the keys and use a cursor, but in the past I haven’t found this to be faster). Maybe the multiple parallel get()s aren’t so bad, though; I haven’t tried it out yet.

At some point soon, I’ll implement getAll() support in PouchDB and report back how much it affects performance. In the course of writing this post, I’ve started to wonder if getAll() doesn’t solve most of the problems I’ve outlined. :slight_smile:

inexorabletash
2015-07-02

I replied somewhere, but not here…

Very much looking forward to your getAll() performance numbers. For putAll() vs. parallel put() calls - we’ll be trying to get numbers on that. The Lovefield team let us know that they see the overhead of waiting for all the success events to fire (and be ignored) before a transaction can commit was a throughput bottleneck so we’ve been investigating eliminating that. But we’ll look at bulk puts as well.

If anyone wants to write us some benchmarks we’d love it!

jonathank
2015-07-05

It sounds to me like joins are more required here for the use case you are after?

Do you have any rough idea how you think the join API might look (In the current implementation). Unless you think the new promises API would be better suited of course.

stuartpb
2015-07-19

I just want to chime in to say that I’m using PouchDB and am definitely feeling the perf burn regarding the N-step get process for getting docs with PouchDB views (even with only, like, six documents, there’s about a full second of lag), so +1 from me.

(Aside @nolanlawson: If this gets implemented, maybe the Pouch docs could start pulling back on all the “Use non-random UUIDs for a free index that works with getAll” protips that kinda sorta directly contradict best practices for B-tree balancing in CouchDB.)

I’m not actually that well-versed in the IndexedDB surface (that’s half of why I’m using PouchDB, to get away from having to know it), but I understand it provides some kind of databases, that use some kind of direct indexing with linear index range access. Given that information, I actually half kind-of expected it already had some kind of mechanism like RethinkDB’s eqJoin.

The way I’d imagine this working (and, again, I have no idea how this kind of call looks in IndexedDB specifically, so the exact semantics would probably be finessed), you’d query the database with (assuming it has to be one function call) an array of objects alongside your main selector. Those objects would look like {key: 'smith', store: 'smiths'}, where the "smith" field of documents in the selection refer to primary indices of documents in the "smiths" store.

IndexedDB would then return a 2-dimensional array, where each second-level array contains the document (if any) that exists for the given index in the original array. (If you wish to get the original document, you do it by specifying the store itself and its primary index field as a position in the requested array - probably the first index.)

stuartpb
2015-07-19

A code example (written against my imaginary IndexedDB API that probably differs from the actual API in minor respects):

var ctx = new IndexedDBIsh();

// Yes, I'm pretending you can get stores synchronously.
// I know that's almost certainly not the case. It's not really a salient detail.
// If it bothers you, imagine there's an `await` in front of all of these.
var mainStore = ctx.getOrCreateStore('example-main-store',
  {primaryIndex: '_id'});
var sideStore = ctx.getOrCreateStore('example-side-store',
  {primaryIndex: 'powerWord'});
var otherSideStore = ctx.getOrCreateStore('example-side-store2',
  {primaryIndex: 'bar'});

// Also assuming putAll is a thing to make populating examples easier
mainStore.putAll([
  {_id: 'asc5g7', wizard: 'Grindelwald', foo: 'grg34535'},
  {_id: 'h556d3', wizard: 'Samguys', foo: 'phwwwbt'},
  {_id: 'p160in9', wizard: 'Oinghrom', foo: 'phwwwbt'}
]);
sideStore.putAll([
  {powerWord: 'Grindelwald', level: 7, color: 'blue'},
  {powerWord: 'Samguys', level: 100, color: 'crystal'},
  {powerWord: 'Oinghrom', level: 9009, color: 'invisible'},
]);
otherSideStore.putAll([
  {bar: 'phwwwbt', irrelevant: 'fields', arbitrary: 'document'},
  {bar: 'grg34535', assuming: 'valid', "no": "structure"},
  {bar: 'consistency', "is": "the", hobgoblin: "of", little: "minds"},
]);

// So then doing:
mainStore.getCross([
  {key: "wizard", store: "example-side-store"},
  {key: "foo", store: "example-side-store2"}
])

/* would yield something like:
[
  {key: 'asc5g7', docs: [
    {powerWord: 'Grindelwald', level: 7, color: 'blue'},
    {bar: 'grg34535', assuming: 'valid', "no": "structure"}
  ]},
  {key: 'h556d3', docs: [
    {powerWord: 'Samguys', level: 100, color: 'crystal'},
    {bar: 'phwwwbt', irrelevant: 'fields', arbitrary: 'document'}
  ]},
  {key: 'p160in9', docs: [
    {powerWord: 'Oinghrom', level: 9009, color: 'invisible'},
    {bar: 'phwwwbt', irrelevant: 'fields', arbitrary: 'document'}
  ]}
]
*/

// And if you want the original document...
mainStore.getCross([
  // you look up its key in its own data store
  {key: "_id", store: "example-main-store"},
  // alongside whatever other docs you want
  {key: "wizard", store: "example-side-store"}
])

/* and then that looks like:

[
  {key: 'asc5g7', docs: [
    {_id: 'asc5g7', wizard: 'Grindelwald', foo: 'grg34535'},
    {powerWord: 'Grindelwald', level: 7, color: 'blue'}
  ]},
  {key: 'h556d3', docs: [
    {_id: 'h556d3', wizard: 'Samguys', foo: 'phwwwbt'},
    {powerWord: 'Samguys', level: 100, color: 'crystal'}
  ]},
  {key: 'p160in9', docs: [
    {_id: 'p160in9', wizard: 'Oinghrom', foo: 'phwwwbt'},
    {powerWord: 'Oinghrom', level: 9009, color: 'invisible'}
  ]}
]
*/
nolanlawson
2015-09-07

@stuartpb The burn you are feeling with secondary indexes in PouchDB is more related to the way we architected PouchDB than a problem with IndexedDB itself. (That is, unless you notice a big speedup when you switch to WebSQL in e.g. Chrome.)

Long story short, in order to provide a reasonable abstraction over IndexedDB, WebSQL, and LevelDB, for secondary indexes we created a layer on top of PouchDB that essentially delegates the responsibility for creating the secondary index to an auxiliary PouchDB. This design has several flaws, notably in performance, but it allowed us to iterate quickly. We will probably replace it in the future.

In the meantime, we are definitely looking into improving the performance of the core APIs (especially the “cursor” APIs, allDocs() and changes()), which would speed up primary indexes and replication in PouchDB as well as slightly speeding up secondary indexes (since those are just a second PouchDB). So when I talk about joins here, I am mostly talking about internal PouchDB details, especially around how we organized “metadata” and “documents” (think: allDocs(), which iterates over document IDs and gives you the winning revision of each document, vs changes(), which iterates over document revisions keyed by the order they were added to the database).

BTW I also did a brief review of getAll() and getAllKeys() and how it might be used in PouchDB; some discussion is on Github.