With the recent breakthroughs in deep learning and related technologies, Machine Learning (ML) algorithms have drastically improved in terms of accuracy, application, performance etc. While typically thought of as a technology only applicable to server technologies, the inferencing process of machine learning models can run on device as well. Development of a machine learning application usually involves two stages:
- The developer first train the model by first creating a skeleton framework and then iterating the model with large dataset
- The developer then port the model to production environment so that it can infer insight from user input
Though training typically takes place in the cloud because it requires a significant amount of data and computing power, inference can take place in the cloud or on the device. Running inference on the device has a number of appealing properties, such as performance boost due to edge computing, resistance toward poor or no network, and security/privacy protection, etc.
Although platforms for native applications have all shipped APIs to support machine learning inference on device, similar functionality has been missing on the web platform. To fill the gap, we could provide an API set including:
- WebAssembly with GPU and multi-thread support
- A WebML (Web Machine Learning) API with a pre-defined set of mathematical functions that the platform can optimize for
- A WebNN (Web Neural Network) API that provides a high-level abstraction to run neural networks efficiently.
Please take a look at the explainer for more detailed info, such as use case, problem statement, proposal, related research, etc. Feedbacks are welcomed! I would love to hear more about what you think !