API Set for Machine Learning on the Web


With the recent breakthroughs in deep learning and related technologies, Machine Learning (ML) algorithms have drastically improved in terms of accuracy, application, performance etc. While typically thought of as a technology only applicable to server technologies, the inferencing process of machine learning models can run on device as well. Development of a machine learning application usually involves two stages:

  • The developer first train the model by first creating a skeleton framework and then iterating the model with large dataset
  • The developer then port the model to production environment so that it can infer insight from user input

Though training typically takes place in the cloud because it requires a significant amount of data and computing power, inference can take place in the cloud or on the device. Running inference on the device has a number of appealing properties, such as performance boost due to edge computing, resistance toward poor or no network, and security/privacy protection, etc.

Although platforms for native applications have all shipped APIs to support machine learning inference on device, similar functionality has been missing on the web platform. To fill the gap, we could provide an API set including:

  1. WebAssembly with GPU and multi-thread support
  2. A WebML (Web Machine Learning) API with a pre-defined set of mathematical functions that the platform can optimize for
  3. A WebNN (Web Neural Network) API that provides a high-level abstraction to run neural networks efficiently.

Please take a look at the explainer for more detailed info, such as use case, problem statement, proposal, related research, etc. Feedbacks are welcomed! I would love to hear more about what you think :grinning: !


GPGPU computing on WebASM would suggest that the native side has been standardized - which it is not, I believe the “de-facto” standard is CUDA, but since that is far from a open standard I’m not sure how that would work. AMD has been working on ROCm and HPC, which somewhat define a open implementation of CUDA - but it is very specific to their hardware and binding nvcc into the browser doesn’t seem like a viable way forward for implementations.

On the other hand, there has been a community group around this: https://www.w3.org/community/gpu/ Safari is the only (and highly experimental) implentation shipping it - I remember seeing someone using this to implement a neural network library.

I would very much like to see some work in this general direction - (2) (linear algebra subroutines) is something that would be nice to have as a first class citizen, possibly with a slightly more usable matrix/vector type. (3) is a bit iffy, as most common libraries used nowadays has a dependency against yet another closed-source library. (cuDNN - although it is possible to live without it, at a cost of slower performance)


webdnn was implemented with webgpu, webgl, and webassembly. The explainer had a link to a list of JS libraries written for this: https://github.com/AngeloKai/js-ml-libraries


IMHO, both (2) and (3) look good to me. Several neural network frameworks like Core ML, WebDNN, etc. are offering their own model data converter for well-known libraries such as Keras, Caffe, TensorFlow, etc. So we could assume neural network model compatibility to some extent. Of course, (2) looks better for extensibility.

FYI: Neural network acceleration is now not limited to GPGPU. There are some examples on native APIs:

  • Android 8.1 NDK provides Neural Network API used by TensorFlow Lite, Caffe2, etc., that could be integrated with a dedicated neural network processor in the device.
  • iOS 11 provides Core ML framework that would make use of neural engine in A11 chip if available.


Thanks for initiating the discussion! I am excited about the idea to bring the hardware accelerated machine learning API to the web platform and would like to contribute.

I’d like to echo the problem statement from hardware optimization angle:

  1. Today’s web platform is disconnected from the most efficient neural network implementation for CPU and GPU.
  2. And it is disconnected from the emerging neural network accelerators.

For #1, using WebAssembly optimziation as example, it is possible to optimize the neural network inference by 128-bit SIMD of WebAssembly. However, a native implementation, like MKL-DNN, is able to leverage wider SIMD instructions, e.g. AVX-512, if it is available on device’s CPU. Optimization by WebGL/WebGPU has similar situation comparing to GPU optimization in native DNN libs.

For #2, the hardware industry moves fast to innovate AI accelerators. Those AI accelerators, from DSP, FPGA to dedicated ASIC, accelerate the performance as well as reduce the power consumption. Especially, that makes efficient neural network inference on edge devices possible.

I agree that a dedicated accelerated machine learning API will fill the gap and enable innovative AI-based web applications on edge devices.

Regarding to the API scope, Angelo proposed three aspects: 1) WebAssembly with GPU and multi-thread 2) WebML and 3) WebNN. It is a complete set. Among them, I’ve happened to look into the Web API for neural network inference. I think it corresponds to 2) or 3)? I’d like to share my thoughts and welcome feedbacks.

The Web API for accelerated neural network inference should:

  1. allow to build neural network by common building blocks, for example convolution, pooling, softmax, normalization and activation.
  2. allow to compile the neural network to native optimized format for hardware execution.
  3. allow to setup input from various sources on web, e.g. media stream, schedule the asynchronous hardware execution and retrieve the output when hardware execution completes.
  4. allow to extend with new building blocks if they are widely supported by native.

With such Web API, the web apps and libs:

  1. can enable various use cases by connecting text, image, video, audio, video, sensor data to neural network as inputs.
  2. can get the best power and performance for neural network inference by offloading to native implementations and exploiting the hardware capabilities.
  3. have the flexibility to integrate different neural network architectures, e.g. MobileNet, SqueezeNet, DenseNet, just name a few, and various model formats, e.g. format of ONNX, Caffe, TensorFlow etc.,
  4. can still innovate new building blocks with WebAssembly and WebGL/WebGPU, thinking as a polyfill, and get acceleration once the API extension is available.


LGTM, @Ningxin_Hu.

Regarding input interfaces, I’d like to clarify our current situation:

  • We can obtain decoded image data as ImageData via CanvasRenderingContext2D.getImageData().
  • We can process audio data via AudioWorklet or ScriptProcessorNode (to be deprecated) in Web Audio API.
  • We can obtain a frame from a <video> element by drawing the frame on a <canvas> element.
  • We can obtain a frame from a MediaStream by attaching the stream to srcObject in a <video> element. Note that there is no equivalent spec to Web Audio API for video streams yet.

Generally, one of the possibly minimum requirements is that ArrayBuffer, TypedArray, or string could be an input. Also, it might be desirable that real-time input like MediaStream or MediaStreamTrack could be an input.


Thanks for the clarification, @tomoyukilabs.

IMO, the TypedArray input is a MVP feature.


There is one concern that loading/parsing model data format in hundreds of MBs might be too heavy for JavaScript runtime. Can we consider another approach to load/parse model data without storing it as a JavaScript variable?