[Proposal] Native media optimization and (basic) editing support

Related discussion: crbug.com/1018022

Images account for ~50% of transferred bytes and remain one the biggest optimization problems and opportunities on the web. The reasons why this problem persists are numerous and complex. An incomplete list and in no particular order…

  • Most users who upload media are not tech savvy enough to know about formats, file sizes, etc (nor should they be!). The natural flow is to pick the file from a media picker and hit upload.
  • The torso and long tail of the web are not setup to bear the cost of image optimization: the economics of the long tail are to host the most number of sites and assets at lowest cost; format optimization, resizing, etc., is CPU+storage costly and is omitted.
  • Image optimization is hard™ and setting up CDNs or open source services requires awareness, technical know-how, and often a credit card.

We (the webperf community and web developers at large) have been beating our heads against this problem for over a decade but the “solution at scale” remains an unsolved challenge.

I will suggest that the only way to make progress in this space is to ensure that media leaving the device is optimized against criteria specified by the upload target. That is, the optimization should happen on the users device, before it is uploaded, and according to criteria specified by the service or site that is initiating the upload. Such common criteria are: accepted file type, file size, aspect ratio or max width/height, or duration for video/animated images.

Prompts like the above are common on the web and are a terrible user experience. As the user faced with such dialogue: how do I resize the image to fit, how do I reduce the file size, how do I crop?

The browser can fix both the technical and cost problem faced by site owners, as well as significantly improve the user experience (latency, cost) for the user who is initiating the upload.

A smarter (image) file input ~ MVP

Note: all names below are examples, subject to naming bikeshed, etc.

<input type="file" accept="image/*">

We already support some control over the inputs (via `accept`) to file upload. What is missing are the output controls, which the site owner could specify to instruct the browser on how it should assist the user. Such outputs could be…

Output filetype

<input type="file" 

Transcoding files on the server incurs the cost of potentially unoptimized upload for the user, as well as CPU transcoding costs for the server. In many cases the site knows the exact format it needs to receive, and the browser should be able to transcode it before it is uploaded.

Maximum upload size

<input type="file" 

The browser should automatically re-encode the image on behalf of the user, with the best quality it can, against the specified limit. For example, if the user picks a 32MB image from their mobile gallery, it shouldn’t throw an error but do the work on behalf of the user to meet the page specified criteria.

Dimension and aspect ratio constraints

<input type="file" 
   outputDimensions="1:1"> // or “100x100px”, “1024px”...

The browser should automatically resize the image against specified width or height requirement on behalf of the user. If the aspect ratio does not match the input image, it should have a simple UI that allows them to figure out what to crop. For bonus points, the browser can also provide smart crop previews to assist the user; apply and demonstrate some of that ML magic we keep hearing about!

A smarter (video) file input ~ beyond MVP

(Note: I would suggest we start by exploring image oriented use cases first, but we should keep in mind that video has similar (and even more acute and amplified) challenges for sites and users)

Many of our mobile devices are now shooting at 4K resolution, which clocks in at 10Mbps. Such uploads are costly for the user, prone to fail due to large filesizes, expensive to store and extremely expensive to manipulate on the server, and are rarely what is desired to be served to the user — for these reasons, free video optimization CDN’s (modulo, YouTube) are not a thing on the web.

The browser is in a position to help both the user and site owner: it can downsample the video prior to upload, convert to alternate format if necessary, and provide a UI for the user to trim prior to upload, e.g…

<input type="file" 
  • Provide a UI to trim video to specified length prior to upload
  • Enable the site to provide and enforce max filesize, accepted format.


Doesn’t canvas already allow me to resize, re-encode? Also, WASM?

In theory, yes, some of this is possible today (squoosh.app is an example) and proposed capabilities like WebCodecs might unlock even more powerful use cases in the future. However, the fact that something is or may be possible in the future does not automatically mean that it will be adopted at scale and by all sites.

In practice, implementing resizing, size optimization, cropping, etc., are hard technical problems. Case in point, it’s been possible to resize images via canvas for a long time, but that’s not a common or widely used best practice. The browser can and needs to own this problem if we want to see change at scale. At the same time, for those that want to pop the hood and implement own variant: great, we’ve got APIs for you!

How have others solved it today?

This problem space is hard. In fact, entire companies have been built around it: UploadCare, FileStack, etc. What’s described in this doc is a (small) subset of the services they offer, but a critical subset that should not require a credit card or technical knowledge to integrate.

There are free image CDNs, doesn’t that solve the problem?

No, CDNs do not solve all the problems.

At upload time, the important criteria is that the media is optimized locally and before it is sent to the upload (potentially, CDN) server: the user must be able to perform basic editing operations like crop, video trim, rotate, etc, locally and with low latency; the file must (re)encoded prior to upload to maximize likelihood of upload success and minimize data cost for the user, as well as processing and storage costs for the server. Further a CDN should not be a requirement for delivering a reasonable user experience on the web — perhaps a recommended one, but not required…

At serving time, yes, CDNs are and will remain an important best practice, as they can perform further device specific optimizations (e.g. re-encode and serve different formats for various browsers), apply further customizations and transforms, provide media management, etc.


Output filetype

Could this be a list in priority order? “Please give me webp if you can, otherwise JPEG”.

Maximum upload size

This is great. However, the quality of browsers’ JPEG encoders is pretty varied. Would it be worth including a ‘quality’ option instead, where the way to determine quality was standardised. This means browsers with poorer JPEG encoders would produce larger files, but quality would be consistent.

It feels like this would be in addition to the maximum upload size, where the maximum upload size would take priority.

Dimension and aspect ratio constraints

It isn’t clear to me how the user would specify “200px height but auto width” or “300px width but auto height”. Maybe use 200h and 300w?

1 Like

Yeah, that makes sense! Accept header enables the client to advertise what it is able to accept from the server, and in this case it’s the inverse with the server expressing what it is able to accept from the client.

Personally, I would steer clear of this. As you said yourself, “quality” settings vary between formats and for each format between encoders — I don’t think that’s a problem we can solve, or need to drag in into this discussion. I would still focus on output size as leading signal, and defer to browser to do its best.

Not sure what the best syntax here is but yeah that makes sense. I’m hoping that we can find some precedent that we can lean on here. Something in CSS? The key use cases I see are: constrain by width, constrain by height, constrain by width and height, constrain by aspect ratio… which, I guess is similar to previous.

Pre-processing of images by the browser before uploading them sounds great. True, this can be achieved (more or less) by using canvas but that has several drawbacks. Apart from scaling, another common case is rotating JPEG images according to EXIF orientation.

A common use case is uploading photos directly from a phone. Ideally they would be scaled and auto-rotated, and the EXIF orientation updated to reflect the change.

In addition some access to overwriting/removing some of the values in EXIF would be great. Sometimes there may be low-res previews or other binary data, in other cases there may be privacy related information. Being able to filter or remove these parts before uploading an image would be very useful.

Yes, assuming aspect ratio will always be “locked”, a relatively simple way to look at it is by specifying “max width” and “max height”. For example 300x300 would be “maximum width of 300px for landscape or maximum height of 300px for portrait”. Then having a zero as one of the values would be auto/unlimited. The same can also be 300w 0h, etc.

Now that so much of the web is user-generated content, I think it makes sense to approach a solution with the HTML file input. And as you said, it’s unfair to server owners to do costly media encoding at scale.

I have no clue how feasible this would be, but from a UX perspective, if the browser is to handle this, I’m imagining some kind of media encoding queue across all browser windows. It would work asynchronously in the background and once a file is done, it directly uploads to the specified server, even if the user no longer has that window open – kind of like how the browser can continue downloading files even after closing the window.

Given the interesting history of the <img> tag, I wonder if Tim Berners-Lee has been thinking about this problem lately…

Resizing video in the client on upload… that can be very slow (especially on mobile) – and very expensive in battery, and also be a real drain on system performance until completed. and it might run into problems with disk space unless it’s streamed out as re-encoded, in which case you’re seriously increasing the risk of a failed upload.

Welp, the timing couldn’t be more perfect for this - we plan to propose the following video editing API to the Web Application WG this week but we can start it in the WICG since I think this overlaps many different teams (Web Apps, Perf & Media): https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/master/MediaBlob/explainer.md

We decided to start with video editing because we haven’t heard any complaints from webdevs regarding image editing on the client; while the pain for video editing is quite large (all you need to do is edit a video on YouTube and get the queue message to feel the poor UX).

We’ve been working with partners on an early prototype and have seen a minimum of 40x perf improvements due to uploading only necessary bits and only transcoding when necessary. The benefit of this way is that many of the scenarios you don’t want to simply transcode video but manage how or what is transcoded.

Interested to hear your feedback on the proposal. @yoavweiss @marcosc @igrigorik Where do you all feel is the best place to formally propose this as we’d like to start working out more of the specifics (eg: we’re trying to think about error)

WICG feels like the right place… Just a heads up also to @mounir, as co-chair of the Media WG.