ONNX runtime: Cross-platform accelerated machine learning

  • Maybe relevant, since Azure is used as an example, MSFT & Meta recently worked on ONNX-based deployment of Llama 2 in Azure and WSL: https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-me...

    (disclaimer: I work at GH/MSFT, not connected to the Llama 2 project)

  • I would say onnx.ai [0] provides more information about ONNX for those who aren’t working with ML/DL.

    [0] https://onnx.ai

  • Onnx is cool. For one it runs (large) transformer models in the cpu twice faster than pytorch/transformers. But at the moment it lacks a number of crucial features. Specifically:

    It's reliance on google's protobuf with it's 2gb single file limit is an extreme limitation. Yes you can keep weights outside your model file, but still many operations (model slicing) fail.

    Second, inability to offload parts of the model to disc or cpu(like huggingface accelerate) while the rest executes on the gpu.

    Thirdly, inability to partition existing large models easily. You can delete nodes, but then fixing the input/output formats means manually editing text files. The work flow is ridiculous (convert onnx to txt with pdoc, edit in text editor, convert back to binary).

    I really wish they fix all this stuff and more.

  • There are two kinds of runtime: training and inference. ONNX runtime as far as I know is only for inference, which is open for all.

  • I'm personally more excited by StableHLO and/or Tinygrad as portable intermediate languages for ML. They're more RISC. ONNX seems to have almost 200 ops, StableHLO about 100, and Tinygrad about 30.

  • There's also a third-party WebGPU implementation: https://github.com/webonnx/wonnx

  • Is anyone using Onnx-compiled models with Triton Inference Server? Is it worth it? How does it compare to other options like torchscript or tensorrt?

  • Nice, awhile ago, there was new ai python projects that came out and needed the binaries and the website install wasn't available or documented.

    Many users didnt want to install random binaries (security), and the devs didnt document or link directly to the corp websites.

    Now its as easy as pip install, going to make things easier.

    The community is moving faster that the corps making the tools.

  • What's cool is that you can run Onnx models in the browser!

    I have written about it in my blog: https://www.zaynetro.com/post/run-ml-on-devices

  • onnx is nice in principle, but pretty limited. core ops like non-max-suppression can't get properly converted. also model deployment is not great, memory consumption and control thereof worse than with tensorflow.

  • Would it not be better to use https://github.com/tinygrad/tinygrad as an intermediary framework?

  • Does it run on any of the BSDs?

  • [dead]