A partial archive of discourse.wicg.io as of Saturday February 24, 2024.

Why a straight array for canvas.getImageData?

Anthony_Caudill
2015-08-03

One of the things that’s perplexed me about HTML 5 was the decision to string the color component values of getImageData together in a straight, one dimensional array. From a developer’s perspective, it is both inconvenient and unperformant to have to carefully loop through each of the values by adding 1, 2, and 3 to the indices of the colors. From an organizational perspective, it’s a nightmare because you have to copy the array to an array of appropriately structured objects to make sense of the data. This copying slows all operations down by orders of magnitude and needlessly complicates code. Why not make an array of objects with properties red, green, blue, and alpha?

AshleyScirra
2015-08-03

It is actually more performant to return a one dimensional array of groups of four elements, because this is what is stored in the native code in the browser. Constructing thousands (or even millions) of small Pixel objects with r, g, b and a properties would be a significant performance impact. It’s straightforward to do this yourself if you want, but the conversion to and from this format would probably be a lot slower than the pixel operations you were going to do anyway.

One technique I like is to make a Uint32Array view to the returned data. Now each pixel is one array element, and you can pick out pixels with bitwise logic (which is very fast).

Anthony_Caudill
2015-08-03

It’s also useless, because any and all operations you’d need to access the image buffer for require direct interaction with the color components. Your bitwise logic operations would be redundant if the pixels were sorted into object arrays at the browser level, nor are they faster than to create object arrays. It makes zero sense for browser vendors to code canvas pixels as a straight array, unless that’s the way the hardware expects it to be fed and even then, a “visor” could interpret the array as objects in the same way Uint32Array interprets the array as an integer. Your bitwise logic extractions are absolutely not faster, or even as fast, than a one-time native code object array struct.

Just to find out, I made an algorithm which extracts palettes from true colors images and exchanges them. The one which compensates for the array not being sorted into objects is actually slower than the one which does. Switching just one image’s palette takes 2 1/2 minutes on a Core2Duo 2.66 Ghz machine with one core.

AshleyScirra
2015-08-03

Yes, the hardware does use it in that format too. I have a lot of native code experience and every image related API I have ever used in native land also uses that kind of format. It is the fastest format, because that is the raw representation and converting it to or from anything else would have an O(n) performance overhead.

I’d be curious to see your benchmark, since I am pretty sure the fastest way is to access the typed array directly.

stuartpb
2015-08-03

Is that benchmark taking the time to convert the Array into your desired format into account? Remember: this is the format the image is already in. Sending the same format to JS lets the VM assemble the Uint8ClampedArray with a hardware/kernel/stdlib-assisted operation like memcpy under the hood (or even a copy-on-write object that reads directly from the underlying data without needing a special RGBA-address-translation interface) - even doing the work for the conversion in the implementation is still going to have that hitch.

Also, if your comparison is taking the conversion time into account and it’s faster, are you doing linear access for both loops? Row-oriented loops can end up having faster performance, even with higher complexity, than column-oriented loops, due to the performance advantages of cache locality.

Anthony_Caudill
2015-08-04

No, it doesn’t, because as I explicitly said, I made two version of the algorithm, one with reorganization and one without. You see, the additional operations required to pick out each and every time sum up to more than the cost of reorganization at start, particularly when the data is operated on multiple times. The more complex the operation, the greater the ultimate penalty.

Here’s a fiddle of the whole thing, complete with benchmarking. The operation takes a while so you’ll need to either leave the browser alone during the process (the browser will become unresponsive during the benchmarking), or approve the prompts if your browser tries to time it out.

Instructions: Browse for any two different images using the buttons provided (Image and Palette). Then click “Repalette”. The first image will be recolored to match the palette of the second. You won’t be able to see it though, 1) because the benchmarking requires the original image be used rather than the recolor; and 2) browsers like Firefox have runover problems between setTimeout and long-running code. (I could fix that… it’s not a priority. If you want to see the recolor, comment out the setTimeout in Repalette().)

https://jsfiddle.net/hc52jx04/5/

stuartpb
2015-08-04

Okay - so that’s the performance profile for your use case. There’s nothing stopping you from implementing this optimization for your use case.

However, the point I think you might be missing here is that many operations on image data (I would cautiously estimate the majority of them) don’t involve repeated accesses to the same points (some, like a calculation of the convex hull, don’t even access every point). For these use cases, reformatting the array would be a needless performance hit, and for the cases where it does increase performance (I assume because the property accesses get JITted into register operations in a way the VM doesn’t anticipate for typed arrays), it’s using an operation that the end dev can optimize for their own case (maybe you’ll want to implement some layers of memoization alongside the raw channel values).

Anthony_Caudill
2015-08-05

Here’s a newer edit of the fiddle, with recommendations from Moz’s graphics team. http://jsfiddle.net/hc52jx04/16/

On my computer, the reorganized algorithm is 200x faster.