Hacker News Clone

ONNX runtime: Cross-platform accelerated machine learning

by valgaze on 7/25/2023, 3:13 PM with 36 comments

by summarity on 7/25/2023, 3:54 PM
Maybe relevant, since Azure is used as an example, MSFT & Meta recently worked on ONNX-based deployment of Llama 2 in Azure and WSL: https://blogs.microsoft.com/blog/2023/07/18/microsoft-and-me...
(disclaimer: I work at GH/MSFT, not connected to the Llama 2 project)
by thangngoc89 on 7/25/2023, 3:54 PM
I would say onnx.ai [0] provides more information about ONNX for those who aren’t working with ML/DL.
[0] https://onnx.ai
by Roark66 on 7/25/2023, 10:42 PM
Onnx is cool. For one it runs (large) transformer models in the cpu twice faster than pytorch/transformers. But at the moment it lacks a number of crucial features. Specifically:
It's reliance on google's protobuf with it's 2gb single file limit is an extreme limitation. Yes you can keep weights outside your model file, but still many operations (model slicing) fail.
Second, inability to offload parts of the model to disc or cpu(like huggingface accelerate) while the rest executes on the gpu.
Thirdly, inability to partition existing large models easily. You can delete nodes, but then fixing the input/output formats means manually editing text files. The work flow is ridiculous (convert onnx to txt with pdoc, edit in text editor, convert back to binary).
I really wish they fix all this stuff and more.
by synergy20 on 7/25/2023, 3:57 PM
There are two kinds of runtime: training and inference. ONNX runtime as far as I know is only for inference, which is open for all.
by modeless on 7/26/2023, 4:05 AM
I'm personally more excited by StableHLO and/or Tinygrad as portable intermediate languages for ML. They're more RISC. ONNX seems to have almost 200 ops, StableHLO about 100, and Tinygrad about 30.
by tormeh on 7/25/2023, 8:43 PM
There's also a third-party WebGPU implementation: https://github.com/webonnx/wonnx
by claytonjy on 7/25/2023, 7:03 PM
Is anyone using Onnx-compiled models with Triton Inference Server? Is it worth it? How does it compare to other options like torchscript or tensorrt?
by IronWolve on 7/25/2023, 5:50 PM
Nice, awhile ago, there was new ai python projects that came out and needed the binaries and the website install wasn't available or documented.
Many users didnt want to install random binaries (security), and the devs didnt document or link directly to the corp websites.
Now its as easy as pip install, going to make things easier.
The community is moving faster that the corps making the tools.
by zaynetro on 7/25/2023, 9:39 PM
What's cool is that you can run Onnx models in the browser!
I have written about it in my blog: https://www.zaynetro.com/post/run-ml-on-devices
by homarp on 7/25/2023, 7:32 PM
see also tvm https://tvm.apache.org/
by ThouYS on 7/26/2023, 6:30 AM
onnx is nice in principle, but pretty limited. core ops like non-max-suppression can't get properly converted. also model deployment is not great, memory consumption and control thereof worse than with tensorflow.
by FloatArtifact on 7/26/2023, 12:17 AM
Would it not be better to use https://github.com/tinygrad/tinygrad as an intermediary framework?
by jxy on 7/26/2023, 2:10 AM
Does it run on any of the BSDs?
by MrStonedOne on 7/25/2023, 8:38 PM
[dead]