[Proposal] Native media optimization and (basic) editing support

Pre-processing of images by the browser before uploading them sounds great. True, this can be achieved (more or less) by using canvas but that has several drawbacks. Apart from scaling, another common case is rotating JPEG images according to EXIF orientation.

A common use case is uploading photos directly from a phone. Ideally they would be scaled and auto-rotated, and the EXIF orientation updated to reflect the change.

In addition some access to overwriting/removing some of the values in EXIF would be great. Sometimes there may be low-res previews or other binary data, in other cases there may be privacy related information. Being able to filter or remove these parts before uploading an image would be very useful.

Yes, assuming aspect ratio will always be “locked”, a relatively simple way to look at it is by specifying “max width” and “max height”. For example 300x300 would be “maximum width of 300px for landscape or maximum height of 300px for portrait”. Then having a zero as one of the values would be auto/unlimited. The same can also be 300w 0h, etc.

1 Like

Now that so much of the web is user-generated content, I think it makes sense to approach a solution with the HTML file input. And as you said, it’s unfair to server owners to do costly media encoding at scale.

I have no clue how feasible this would be, but from a UX perspective, if the browser is to handle this, I’m imagining some kind of media encoding queue across all browser windows. It would work asynchronously in the background and once a file is done, it directly uploads to the specified server, even if the user no longer has that window open – kind of like how the browser can continue downloading files even after closing the window.

Given the interesting history of the <img> tag, I wonder if Tim Berners-Lee has been thinking about this problem lately…

Resizing video in the client on upload… that can be very slow (especially on mobile) – and very expensive in battery, and also be a real drain on system performance until completed. and it might run into problems with disk space unless it’s streamed out as re-encoded, in which case you’re seriously increasing the risk of a failed upload.

Welp, the timing couldn’t be more perfect for this - we plan to propose the following video editing API to the Web Application WG this week but we can start it in the WICG since I think this overlaps many different teams (Web Apps, Perf & Media): https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/master/MediaBlob/explainer.md

We decided to start with video editing because we haven’t heard any complaints from webdevs regarding image editing on the client; while the pain for video editing is quite large (all you need to do is edit a video on YouTube and get the queue message to feel the poor UX).

We’ve been working with partners on an early prototype and have seen a minimum of 40x perf improvements due to uploading only necessary bits and only transcoding when necessary. The benefit of this way is that many of the scenarios you don’t want to simply transcode video but manage how or what is transcoded.

Interested to hear your feedback on the proposal. @yoavweiss @marcosc @igrigorik Where do you all feel is the best place to formally propose this as we’d like to start working out more of the specifics (eg: we’re trying to think about error)

WICG feels like the right place… Just a heads up also to @mounir, as co-chair of the Media WG.

1 Like

So it seems that a WICG repo seems reasonable. Since this is in our Edge explainer repo which I’m not going to transfer can an admin spin up a video-editing repo and I’ll move the explainer over. I’ll detangle the image editing from this as Ilya and I discussed this at CDS and we don’t think they should be coupled.

1 Like

I’ve set up a video-editing repo here: https://github.com/WICG/video-editing

Let me know if you want me to also spin up a separate image-editing one.

Happy incubating!!

1 Like

Thanks all for the great feedback! There’s a few different places where related discussions took place, want to capture a few highlights and use cases for reference to help scope what we explore here…

WordPress core media

@azaozz for reference, copying some of the points you made in the WP core-media slack thread:

In Core we’ve looked at “in browser” image manipulations (scaling and rotating) several times since the media modal was added in 3.5. The main problems at the time were that it won’t work on phones as it takes huge amount of RAM, and that there was no good way to write EXIF after rotating…

The EXIF thing is the biggest hurdle at the moment. There’s no good/straightforward way to change the “orientation” value there from js… Also some phones and cameras add… things there that need to be removed, like “low-res preview” etc.

If this was handled by the browser more or less “automatically”, I’d be a bit +1 to implement in WP as soon as possible (even if it would mean dealing with “polyfills” etc.)


Roderick, crbug comment:

Web app could have the UI to show the maximum size user could upload then user could choose the size within the range without client side calculation after uploaded. Will be also helpful if we could directly define max/min size on the input html tag, and browser will directly hide/fade the file options for ineligible file.


Nolan O’Brien (Images lead at Twitter), crbug comment:

"Having access to that native photo picker (for resizing) would obviously be great. The options of original, large, medium and small are not as useful to us and our needs, and we’d frankly want to specify something specific for size constraints, but see the generalized use cases value for most app/site devs that don’t have our scale.

Going into what would really be valuable, us being able to specify the encoding format would be big. We could start with image/png and image/jpeg - then we could have room to experiment with different formats like adding webp or heic support. That way, when there is a photo in an unsupported file format, it can be auto converted to a supported format.

Granular “preferred” format encoding config options would be great too, avoid us having to double encode things and introducing quality loss. Such as: image/jpeg;q=85;chroma-subsampling=4:2:0

Specifying the target size bounds would be ideal too. Then we could have 2048x2048 uploads for mobile and 4096x4096 for desktop/tablet and experiment."

“we transcode the image before upload – same with other sites. Feel free to upload at larger sizes, it should just get shrunk by our client for you, but we won’t preserve the quality with sizes larger than 5MBs.” — Twitter


@Jake_Archibald (crbug): this should include an option to strip metadata. This has huge file size benefits, especially since some phones embed video data within the image, but it’s also great for privacy, since users may not be aware that images can include geo tags etc. Do we strip any of this stuff already?

@Matthieu (crbug): Love the idea of native resize.Would love it even more with a declarative way to specify the desired resolution, quality, … which of course would depend on the browser support and user approval. +1 the option to strip metadata, and please don’t add any new metadata as the images are resized/re-encoded.

1 Like

As a brief aside, recently gave a talk / retrospective at performance.now() on why we haven’t been able to solve the image optimization problem at scale, and how we ought to approach this problem moving forward. Spoiler, if you’re reading this, you won’t be shocked that — I believe — providing the primitives outlined in this thread is a critical part of that solution.

@gregwhitworth to close the loop on our CDS discussion…

First off, said it in person but for public record want to add a strong +1 to the proposal and the API you are exploring. I have some questions and feedback on the specifics, but I’ll file those on the new WICG repo (update: here). The one big thing that’s probably worth calling out here is…

I flagged video as a potential (declarative markup) use case to explore in this proposal, but there are definitely more considerations and complications when it comes to video. For example, implications of how large clips are processed, how progress is communicated to the user, how the upload is handled, and so on. In that light, starting with a promise-based API approach is the right strategy here.

We can run these things in parallel and adjacent tracks. I’ll focus this exploration on declarative primitives for (basic) image manipulation, we can pursue an API approach for (basic) video editing, and perhaps somewhere down the road we can revisit a declarative solution for video as well.

@marcosc @yoavweiss tactical question… I think there is substantial amount of support and feedback here (see above) for exploring this space, and I’d like to create an explainer and engage interested site+browser developers in the design. Would you be open to creating a WICG repo for this work?

In terms of the name, perhaps WICG/image-editing to keep things simple and mirror

  • WICG/image-constraints
  • WICG/image-optimization
  • WICG/image-editing

I think I’d lean towards constraints or optimization, since that’s what it’s mostly all about. WDYT?

Hey @igrigorik, sorry it took me a while to get to this. Personally, I really like what you are prosing with the images and letting the browser handle the outputs.

Would you be open to creating a WICG repo for this work?

Yes. Absolutely. I’m scared of the amount of work browser vendors would need to do to adequately support this, but you are right that it would be hugely beneficial to all constituents and the web at large.

It’s up to you how you want to split up the work - but maybe start with a single repo (image-output?) and then spin off new incubations as needed?

@igrigorik, I’ve created https://github.com/WICG/image-output … made you admin.

1 Like

Awesome, thank you Marcos! I’m hopeful that the amount of work here for browsers is actually not that much and we can certainly tackle things in pieces. Hoping we can make good progress here in 2020.

No problem. Filed a Mozilla standards position for the proposal too: https://github.com/mozilla/standards-positions/issues/237

Hypothetical: Let’s say we could do this in a very incremental way, what would be the 2 or 3 high-impact/low-complexity/high-interop things you would pick?

On the Mozilla side, we have concern with runaway complexity of this proposal… so if we can get agreement on just tiny subset of things, then that might be a really good start.

min/max dimensions and aspect ratio would be my number one and number two.

A lot could be implicitly built around that. The browser could just reject files that fail to match in the first version and later automatically handle necessary scaling or provide a UI for cropping when it’s needed.

Similarly, if the file input is accept=image/jpeg and it’s given a TIFF the browser could reject that or allow it and re-encode the file to jpeg on behalf of the user.

Being able to declare a max file size and have the browser say “no, that’s too big” would be handy in general. If the browser then chooses to re-encode image files to fit within the size limits, that’s just a bonus.

I imagine there would need to be some way for js to get a list of what operations the browser provides and a means to disable it if it’s not sufficient. (Use case: browser says “I reject images that are too big but don’t provide any UI for the user to crop or scale the images” and js says “well, in that case, I’ll load a polyfill and handle it myself”).

Quality is a very important setting and needs to be consistent.

Currently using the canvas approach, WebKit encodes JPEGs from Canvas at a different size (i.e not at jpeg compression level) to other browsers, resulting in huge images on iOS. https://bugs.webkit.org/show_bug.cgi?id=154713

Sorry, I forgot that bug was locked down. Here is a simple demo of the problem with output quality https://jsfiddle.net/49uwfrqy/8/

At canvas.toDataURL(‘image/jpeg’, 60) Webkit produces a 23.51kb file, other browsers 12.96Kb

With Chrome (Firefox matches Chrome)

With Webkit (i.e. Safari and all browsers on iOS)

The output size problem is more dramatic with larger images/resolutions

As a result, in order to save bandwidth for Safari or iOS users, browser sniffing is required to produce consistent results.

The spec does say the following

Different implementations can have slightly different interpretations of “quality”. When the quality is not specified, an implementation-specific default is used that represents a reasonable compromise between compression ratio, image quality, and encoding time.


Quality should be jpeg compression level for jpegs, as that is what developers expect.

Quality should be jpeg compression level for jpegs, as that is what developers expect.

I’m assuming you mean ImageMagick’s 0-100 scale?

There are some very interesting discussions to be had about what “quality” actually means, and should actually mean. IM’s scale is probably the most intuitive/guessable for developers; Photoshop’s 0-100 scale (which is significantly different) is the most familiar to designers. Both map to specific JPEG encoder settings, which might affect different images differently; neither maps cleanly to actual quality metrics. This is all, however, a distracting sidebar… I should blog about it.

I investigated this quite a while ago, as you can tell from the topical sample picture at the time!

I seem to recall if you inspect the outputted image’s jpeg compression level header using ImageMagick, they don’t match the passed in value

In addition to the concern about runaway complexity, I’m also a little concerned about general usability. Right now it’s straightforward to get a File (which is a Blob) out of an <input type=file>. But there also might be a bunch of other sources of image data. If we want to expose the browser’s image decoding functionality, would it make more sense to expose it as the general operation on Blob so that it can be used in other cases as well?