[Proposal] Native media optimization and (basic) editing support

Related discussion: crbug.com/1018022

Images account for ~50% of transferred bytes and remain one the biggest optimization problems and opportunities on the web. The reasons why this problem persists are numerous and complex. An incomplete list and in no particular order…

  • Most users who upload media are not tech savvy enough to know about formats, file sizes, etc (nor should they be!). The natural flow is to pick the file from a media picker and hit upload.
  • The torso and long tail of the web are not setup to bear the cost of image optimization: the economics of the long tail are to host the most number of sites and assets at lowest cost; format optimization, resizing, etc., is CPU+storage costly and is omitted.
  • Image optimization is hard™ and setting up CDNs or open source services requires awareness, technical know-how, and often a credit card.

We (the webperf community and web developers at large) have been beating our heads against this problem for over a decade but the “solution at scale” remains an unsolved challenge.

I will suggest that the only way to make progress in this space is to ensure that media leaving the device is optimized against criteria specified by the upload target. That is, the optimization should happen on the users device, before it is uploaded, and according to criteria specified by the service or site that is initiating the upload. Such common criteria are: accepted file type, file size, aspect ratio or max width/height, or duration for video/animated images.

Prompts like the above are common on the web and are a terrible user experience. As the user faced with such dialogue: how do I resize the image to fit, how do I reduce the file size, how do I crop?

The browser can fix both the technical and cost problem faced by site owners, as well as significantly improve the user experience (latency, cost) for the user who is initiating the upload.

A smarter (image) file input ~ MVP

Note: all names below are examples, subject to naming bikeshed, etc.

<input type="file" accept="image/*">

We already support some control over the inputs (via `accept`) to file upload. What is missing are the output controls, which the site owner could specify to instruct the browser on how it should assist the user. Such outputs could be…

Output filetype

<input type="file" 
   accept="image/*"
   output="image/jpeg">

Transcoding files on the server incurs the cost of potentially unoptimized upload for the user, as well as CPU transcoding costs for the server. In many cases the site knows the exact format it needs to receive, and the browser should be able to transcode it before it is uploaded.

Maximum upload size

<input type="file" 
   accept="image/*"
   output="image/jpeg"
   outputMaxSize="1MB">

The browser should automatically re-encode the image on behalf of the user, with the best quality it can, against the specified limit. For example, if the user picks a 32MB image from their mobile gallery, it shouldn’t throw an error but do the work on behalf of the user to meet the page specified criteria.

Dimension and aspect ratio constraints

<input type="file" 
   accept="image/*"
   output="image/jpeg"
   outputDimensions="1:1"> // or “100x100px”, “1024px”...

The browser should automatically resize the image against specified width or height requirement on behalf of the user. If the aspect ratio does not match the input image, it should have a simple UI that allows them to figure out what to crop. For bonus points, the browser can also provide smart crop previews to assist the user; apply and demonstrate some of that ML magic we keep hearing about!


A smarter (video) file input ~ beyond MVP

(Note: I would suggest we start by exploring image oriented use cases first, but we should keep in mind that video has similar (and even more acute and amplified) challenges for sites and users)

Many of our mobile devices are now shooting at 4K resolution, which clocks in at 10Mbps. Such uploads are costly for the user, prone to fail due to large filesizes, expensive to store and extremely expensive to manipulate on the server, and are rarely what is desired to be served to the user — for these reasons, free video optimization CDN’s (modulo, YouTube) are not a thing on the web.

The browser is in a position to help both the user and site owner: it can downsample the video prior to upload, convert to alternate format if necessary, and provide a UI for the user to trim prior to upload, e.g…

<input type="file" 
   accept="image/*"
   output="image/mp4"
   maxLength="10s"
   outputSize="10MB">
  • Provide a UI to trim video to specified length prior to upload
  • Enable the site to provide and enforce max filesize, accepted format.

FAQ

Doesn’t canvas already allow me to resize, re-encode? Also, WASM?

In theory, yes, some of this is possible today (squoosh.app is an example) and proposed capabilities like WebCodecs might unlock even more powerful use cases in the future. However, the fact that something is or may be possible in the future does not automatically mean that it will be adopted at scale and by all sites.

In practice, implementing resizing, size optimization, cropping, etc., are hard technical problems. Case in point, it’s been possible to resize images via canvas for a long time, but that’s not a common or widely used best practice. The browser can and needs to own this problem if we want to see change at scale. At the same time, for those that want to pop the hood and implement own variant: great, we’ve got APIs for you!

How have others solved it today?

This problem space is hard. In fact, entire companies have been built around it: UploadCare, FileStack, etc. What’s described in this doc is a (small) subset of the services they offer, but a critical subset that should not require a credit card or technical knowledge to integrate.

There are free image CDNs, doesn’t that solve the problem?

No, CDNs do not solve all the problems.

At upload time, the important criteria is that the media is optimized locally and before it is sent to the upload (potentially, CDN) server: the user must be able to perform basic editing operations like crop, video trim, rotate, etc, locally and with low latency; the file must (re)encoded prior to upload to maximize likelihood of upload success and minimize data cost for the user, as well as processing and storage costs for the server. Further a CDN should not be a requirement for delivering a reasonable user experience on the web — perhaps a recommended one, but not required…

At serving time, yes, CDNs are and will remain an important best practice, as they can perform further device specific optimizations (e.g. re-encode and serve different formats for various browsers), apply further customizations and transforms, provide media management, etc.

9 Likes

Output filetype

Could this be a list in priority order? “Please give me webp if you can, otherwise JPEG”.

Maximum upload size

This is great. However, the quality of browsers’ JPEG encoders is pretty varied. Would it be worth including a ‘quality’ option instead, where the way to determine quality was standardised. This means browsers with poorer JPEG encoders would produce larger files, but quality would be consistent.

It feels like this would be in addition to the maximum upload size, where the maximum upload size would take priority.

Dimension and aspect ratio constraints

It isn’t clear to me how the user would specify “200px height but auto width” or “300px width but auto height”. Maybe use 200h and 300w?

1 Like

Yeah, that makes sense! Accept header enables the client to advertise what it is able to accept from the server, and in this case it’s the inverse with the server expressing what it is able to accept from the client.

Personally, I would steer clear of this. As you said yourself, “quality” settings vary between formats and for each format between encoders — I don’t think that’s a problem we can solve, or need to drag in into this discussion. I would still focus on output size as leading signal, and defer to browser to do its best.

Not sure what the best syntax here is but yeah that makes sense. I’m hoping that we can find some precedent that we can lean on here. Something in CSS? The key use cases I see are: constrain by width, constrain by height, constrain by width and height, constrain by aspect ratio… which, I guess is similar to previous.

Pre-processing of images by the browser before uploading them sounds great. True, this can be achieved (more or less) by using canvas but that has several drawbacks. Apart from scaling, another common case is rotating JPEG images according to EXIF orientation.

A common use case is uploading photos directly from a phone. Ideally they would be scaled and auto-rotated, and the EXIF orientation updated to reflect the change.

In addition some access to overwriting/removing some of the values in EXIF would be great. Sometimes there may be low-res previews or other binary data, in other cases there may be privacy related information. Being able to filter or remove these parts before uploading an image would be very useful.

Yes, assuming aspect ratio will always be “locked”, a relatively simple way to look at it is by specifying “max width” and “max height”. For example 300x300 would be “maximum width of 300px for landscape or maximum height of 300px for portrait”. Then having a zero as one of the values would be auto/unlimited. The same can also be 300w 0h, etc.

1 Like

Now that so much of the web is user-generated content, I think it makes sense to approach a solution with the HTML file input. And as you said, it’s unfair to server owners to do costly media encoding at scale.

I have no clue how feasible this would be, but from a UX perspective, if the browser is to handle this, I’m imagining some kind of media encoding queue across all browser windows. It would work asynchronously in the background and once a file is done, it directly uploads to the specified server, even if the user no longer has that window open – kind of like how the browser can continue downloading files even after closing the window.

Given the interesting history of the <img> tag, I wonder if Tim Berners-Lee has been thinking about this problem lately…

Resizing video in the client on upload… that can be very slow (especially on mobile) – and very expensive in battery, and also be a real drain on system performance until completed. and it might run into problems with disk space unless it’s streamed out as re-encoded, in which case you’re seriously increasing the risk of a failed upload.

Welp, the timing couldn’t be more perfect for this - we plan to propose the following video editing API to the Web Application WG this week but we can start it in the WICG since I think this overlaps many different teams (Web Apps, Perf & Media): https://github.com/MicrosoftEdge/MSEdgeExplainers/blob/master/MediaBlob/explainer.md

We decided to start with video editing because we haven’t heard any complaints from webdevs regarding image editing on the client; while the pain for video editing is quite large (all you need to do is edit a video on YouTube and get the queue message to feel the poor UX).

We’ve been working with partners on an early prototype and have seen a minimum of 40x perf improvements due to uploading only necessary bits and only transcoding when necessary. The benefit of this way is that many of the scenarios you don’t want to simply transcode video but manage how or what is transcoded.

Interested to hear your feedback on the proposal. @yoavweiss @marcosc @igrigorik Where do you all feel is the best place to formally propose this as we’d like to start working out more of the specifics (eg: we’re trying to think about error)

WICG feels like the right place… Just a heads up also to @mounir, as co-chair of the Media WG.

1 Like

So it seems that a WICG repo seems reasonable. Since this is in our Edge explainer repo which I’m not going to transfer can an admin spin up a video-editing repo and I’ll move the explainer over. I’ll detangle the image editing from this as Ilya and I discussed this at CDS and we don’t think they should be coupled.

1 Like

I’ve set up a video-editing repo here: https://github.com/WICG/video-editing

Let me know if you want me to also spin up a separate image-editing one.

Happy incubating!!

1 Like

Thanks all for the great feedback! There’s a few different places where related discussions took place, want to capture a few highlights and use cases for reference to help scope what we explore here…


WordPress core media

@azaozz for reference, copying some of the points you made in the WP core-media slack thread:

In Core we’ve looked at “in browser” image manipulations (scaling and rotating) several times since the media modal was added in 3.5. The main problems at the time were that it won’t work on phones as it takes huge amount of RAM, and that there was no good way to write EXIF after rotating…

The EXIF thing is the biggest hurdle at the moment. There’s no good/straightforward way to change the “orientation” value there from js… Also some phones and cameras add… things there that need to be removed, like “low-res preview” etc.

If this was handled by the browser more or less “automatically”, I’d be a bit +1 to implement in WP as soon as possible (even if it would mean dealing with “polyfills” etc.)

Tinder

Roderick, crbug comment:

Web app could have the UI to show the maximum size user could upload then user could choose the size within the range without client side calculation after uploaded. Will be also helpful if we could directly define max/min size on the input html tag, and browser will directly hide/fade the file options for ineligible file.

Twitter

Nolan O’Brien (Images lead at Twitter), crbug comment:

"Having access to that native photo picker (for resizing) would obviously be great. The options of original, large, medium and small are not as useful to us and our needs, and we’d frankly want to specify something specific for size constraints, but see the generalized use cases value for most app/site devs that don’t have our scale.

Going into what would really be valuable, us being able to specify the encoding format would be big. We could start with image/png and image/jpeg - then we could have room to experiment with different formats like adding webp or heic support. That way, when there is a photo in an unsupported file format, it can be auto converted to a supported format.

Granular “preferred” format encoding config options would be great too, avoid us having to double encode things and introducing quality loss. Such as: image/jpeg;q=85;chroma-subsampling=4:2:0

Specifying the target size bounds would be ideal too. Then we could have 2048x2048 uploads for mobile and 4096x4096 for desktop/tablet and experiment."

Other

@Jake_Archibald (crbug): this should include an option to strip metadata. This has huge file size benefits, especially since some phones embed video data within the image, but it’s also great for privacy, since users may not be aware that images can include geo tags etc. Do we strip any of this stuff already?

@Matthieu (crbug): Love the idea of native resize.Would love it even more with a declarative way to specify the desired resolution, quality, … which of course would depend on the browser support and user approval. +1 the option to strip metadata, and please don’t add any new metadata as the images are resized/re-encoded.

1 Like

As a brief aside, recently gave a talk / retrospective at performance.now() on why we haven’t been able to solve the image optimization problem at scale, and how we ought to approach this problem moving forward. Spoiler, if you’re reading this, you won’t be shocked that — I believe — providing the primitives outlined in this thread is a critical part of that solution.


@gregwhitworth to close the loop on our CDS discussion…

First off, said it in person but for public record want to add a strong +1 to the proposal and the API you are exploring. I have some questions and feedback on the specifics, but I’ll file those on the new WICG repo (update: here). The one big thing that’s probably worth calling out here is…

I flagged video as a potential (declarative markup) use case to explore in this proposal, but there are definitely more considerations and complications when it comes to video. For example, implications of how large clips are processed, how progress is communicated to the user, how the upload is handled, and so on. In that light, starting with a promise-based API approach is the right strategy here.

We can run these things in parallel and adjacent tracks. I’ll focus this exploration on declarative primitives for (basic) image manipulation, we can pursue an API approach for (basic) video editing, and perhaps somewhere down the road we can revisit a declarative solution for video as well.


@marcosc @yoavweiss tactical question… I think there is substantial amount of support and feedback here (see above) for exploring this space, and I’d like to create an explainer and engage interested site+browser developers in the design. Would you be open to creating a WICG repo for this work?

In terms of the name, perhaps WICG/image-editing to keep things simple and mirror

  • WICG/image-constraints
  • WICG/image-optimization
  • WICG/image-editing

I think I’d lean towards constraints or optimization, since that’s what it’s mostly all about. WDYT?