Hacker News Clone

Nvidia DGX Spark: great hardware, early days for the ecosystem

by GavinAnderegg on 10/15/2025, 12:49 AM with 101 comments

by simonw on 10/15/2025, 4:34 AM
It's notable how much easier it is to get things working now that the embargo has lifted and other projects have shared their integrations.
I'm running VLLM on it now and it was as simple as:
```
  docker run --gpus all -it --rm \
    --ipc=host --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    nvcr.io/nvidia/vllm:25.09-py3
```
(That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )
And then in the Docker container:
```
  vllm serve &
  vllm chat
```
The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.
by rcarmo on 10/15/2025, 6:48 AM
About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.
Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).
Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.
by andy99 on 10/15/2025, 11:09 PM
Is there like an affiliate link or something where I can just buy one? Nvidia’s site says sold out, PNY invites you to find a retailer, the other links from nvidia didn’t seem to go anywhere. Can one just click to buy it somewhere?
by physicsguy on 10/15/2025, 8:32 AM
A few years ago I worked on an ARM supercomputer, as well as a POWER9 one. x86 is so assumed for anything other than trivial things that it is painful.
What I found was a good solution was using Spack: https://spack.io/ That allows you to download/build the full toolchain of stuff you need for whatever architecture you are on - all dependencies, compilers (GCC, CUDA, MPI, etc.), compiled Python packages, etc. and if you need to add a new recipe for something it is really easy.
For the fellow Brits - you can tell this was named by Americans!!!
by smallnamespace on 10/15/2025, 7:47 AM
An 14-inch M4 Max Macbook Pro with 128GB of RAM has a list price of $4700 or so and twice the memory bandwidth.
For inference decode the bandwidth is the main limitation so if running LLMs is your use case you should probably get a Mac instead.
by two_handfuls on 10/15/2025, 5:00 AM
I wonder how this compares financially with renting something on the cloud.
by fnordpiglet on 10/15/2025, 4:56 AM
This seems to be missing the obligatory pelican on a bicycle.
by reenorap on 10/15/2025, 6:04 AM
Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?
by _joel on 10/15/2025, 8:42 AM
How would this fare alongside the new Ryzen chips, ooi? From memory is seems to be getting the same amount of tok/s but would the Ryzen box be more useful for other computing, not just AI?
by jhcuii on 10/15/2025, 6:44 AM
Despite the large video memory capacity, its video memory bandwidth is very low. I guess the model's decode speed will be very slow. Of course, this design is very well suited for the inference needs of MoE models.
by solarboii on 10/15/2025, 3:13 PM
Are there any benchmarks comparing it with the Nvidia Thor? It is much more available than spark, and performance might not be very different
by storus on 10/15/2025, 11:44 AM
Is ASUS Ascent GX10 and similar from Lenovo etc. 100% compatible with DGX Spark and can be chained together with the same functionality (i.e. ASUS together with Lenovo for 256GB inference)?
by saagarjha on 10/15/2025, 7:07 AM
I’m kind of surprised at the issues everyone is having with the arm64 hardware. PyTorch has been building official wheels for several months already as people get on GH200s. Has the rest of the ecosystem not kept up?
by amelius on 10/15/2025, 9:17 AM
> x86 architecture for the rest of the machine.
Can anyone explain this? Does this machine have multiple CPU architectures?
by fisian on 10/15/2025, 5:18 AM
The reported 119GB vs. 128GB according to spec is because 128GB (1e9 bytes) equals 119GiB (2^30 bytes).
by matt3210 on 10/15/2025, 4:41 AM
> even in a Docker container
I should be allowed to do stupid things when I want. Give me an override!
by rgovostes on 10/15/2025, 5:44 AM
I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.
by B1FF_PSUVM on 10/15/2025, 10:02 AM
I went looking for pictures (in the photo the box looked like a tray to me ...) and found an interesting piece by Canonical touting their Ubuntu base for the OS: https://canonical.com/blog/nvidia-dgx-spark-ubuntu-base
P.S. exploded view from the horse's mouth: https://www.nvidia.com/pt-br/products/workstations/dgx-spark...
by ur-whale on 10/15/2025, 3:51 AM
As is usual for NVidia: great hardware, an effing nightmare figuring out how to setup the pile of crap they call software.
by ChrisArchitect on 10/15/2025, 2:50 AM
More discussion: https://news.ycombinator.com/item?id=45575127
by monster_truck on 10/15/2025, 6:07 AM
Whole thing feels like a paper launch being held up by people looking for blog traffic missing the point.
I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training
by rvz on 10/15/2025, 10:14 AM
TLDR: Just buy a RTX 5090.
The DGX Spark is completely overpriced for its performance compared to a single RTX 5090.