Nvidia DGX Spark: great hardware, early days for the ecosystem

  • It's notable how much easier it is to get things working now that the embargo has lifted and other projects have shared their integrations.

    I'm running VLLM on it now and it was as simple as:

      docker run --gpus all -it --rm \
        --ipc=host --ulimit memlock=-1 \
        --ulimit stack=67108864 \
        nvcr.io/nvidia/vllm:25.09-py3
    
    (That recipe from https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?v... )

    And then in the Docker container:

      vllm serve &
      vllm chat
    
    The default model it loads is Qwen/Qwen3-0.6B, which is tiny and fast to load.

  • About what I expected. The Jetson series had the same issues, mostly, at a smaller scale: Deviate from the anointed versions of YOLO, and nothing runs without a lot of hacking. Being beholden to CUDA is both a blessing and a curse, but what I really fear is how long it will take for this to become an unsupported golden brick.

    Also, the other reviews I’ve seen point out that inference speed is slower than a 5090 (or on par with a 4090 with some tailwind), so the big difference here (other than core counts) is the large chunk of “unified” memory. Still seems like a tricky investment in an age where a Mac will outlive everything else you care to put on a desk and AMD has semi-viable APUs with equivalent memory architectures (even if RoCm is… well… not all there yet).

    Curious to compare this with cloud-based GPU costs, or (if you really want on-prem and fully private) the returns from a more conventional rig.

  • Is there like an affiliate link or something where I can just buy one? Nvidia’s site says sold out, PNY invites you to find a retailer, the other links from nvidia didn’t seem to go anywhere. Can one just click to buy it somewhere?

  • A few years ago I worked on an ARM supercomputer, as well as a POWER9 one. x86 is so assumed for anything other than trivial things that it is painful.

    What I found was a good solution was using Spack: https://spack.io/ That allows you to download/build the full toolchain of stuff you need for whatever architecture you are on - all dependencies, compilers (GCC, CUDA, MPI, etc.), compiled Python packages, etc. and if you need to add a new recipe for something it is really easy.

    For the fellow Brits - you can tell this was named by Americans!!!

  • An 14-inch M4 Max Macbook Pro with 128GB of RAM has a list price of $4700 or so and twice the memory bandwidth.

    For inference decode the bandwidth is the main limitation so if running LLMs is your use case you should probably get a Mac instead.

  • I wonder how this compares financially with renting something on the cloud.

  • This seems to be missing the obligatory pelican on a bicycle.

  • Is 128 GB of unified memory enough? I've found that the smaller models are great as a toy but useless for anything realistic. Will 128 GB hold any model that you can do actual work with or query for answers that returns useful information?

  • How would this fare alongside the new Ryzen chips, ooi? From memory is seems to be getting the same amount of tok/s but would the Ryzen box be more useful for other computing, not just AI?

  • Despite the large video memory capacity, its video memory bandwidth is very low. I guess the model's decode speed will be very slow. Of course, this design is very well suited for the inference needs of MoE models.

  • Are there any benchmarks comparing it with the Nvidia Thor? It is much more available than spark, and performance might not be very different

  • Is ASUS Ascent GX10 and similar from Lenovo etc. 100% compatible with DGX Spark and can be chained together with the same functionality (i.e. ASUS together with Lenovo for 256GB inference)?

  • I’m kind of surprised at the issues everyone is having with the arm64 hardware. PyTorch has been building official wheels for several months already as people get on GH200s. Has the rest of the ecosystem not kept up?

  • > x86 architecture for the rest of the machine.

    Can anyone explain this? Does this machine have multiple CPU architectures?

  • The reported 119GB vs. 128GB according to spec is because 128GB (1e9 bytes) equals 119GiB (2^30 bytes).

  • > even in a Docker container

    I should be allowed to do stupid things when I want. Give me an override!

  • I'm hopeful this makes Nvidia take aarch64 seriously for Jetson development. For the past several years Mac-based developers have had to run the flashing tools in unsupported ways, in virtual machines with strange QEMU options.

  • I went looking for pictures (in the photo the box looked like a tray to me ...) and found an interesting piece by Canonical touting their Ubuntu base for the OS: https://canonical.com/blog/nvidia-dgx-spark-ubuntu-base

    P.S. exploded view from the horse's mouth: https://www.nvidia.com/pt-br/products/workstations/dgx-spark...

  • As is usual for NVidia: great hardware, an effing nightmare figuring out how to setup the pile of crap they call software.

  • Whole thing feels like a paper launch being held up by people looking for blog traffic missing the point.

    I'd be pissed if I paid this much for hardware and the performance was this lacklustre while also being kneecapped for training

  • TLDR: Just buy a RTX 5090.

    The DGX Spark is completely overpriced for its performance compared to a single RTX 5090.