NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

SethTro 2 hours ago

Article doesn't seem to mention price which is $4,000 which makes it comparable to a 5090 but with 128GB of unified LPDDR5x vs the 5090's 32GB DDR7.

EnPissant 14 minutes ago

A 5090 is $2000.
CamperBob2 2 hours ago

And about 1/4 the memory bandwidth, which is what matters for inference.

nialse 2 hours ago

Well, that’s disappointing since the Mac Studio 128GB is $3,499. If Apple happens to launch a Mac Mini with 128GB RAM it would eat Nvidia Sparks’ lunch every day.

moondev 25 minutes ago

Just don't try to run a NCCL

newman314 an hour ago

Agreed. I also wonder why they chose to test against a Mac Studio with only 64GB instead of 128GB.

yvbbrjdr an hour ago

Hi, author here. I crowd-sourced the devices for benchmarking from my friends. It just happened that one of my friend has this device.

ggerganov 41 minutes ago

FYI you should have used llama.cpp to do the benchmarks. It performs almost 20x faster than ollama for the gpt-oss-120b model. Here are some samples results on my spark:

  ggml_cuda_init: found 1 CUDA devices:
    Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes
  | model                          |       size |     params | backend    | ngl | n_ubatch | fa |            test |                  t/s |
  | ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | --------------: | -------------------: |
  | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |          pp4096 |       3564.31 ± 9.91 |
  | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |            tg32 |         53.93 ± 1.71 |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |          pp4096 |      1792.32 ± 34.74 |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |            tg32 |         38.54 ± 3.10 |

__mharrison__ 18 minutes ago

Curious to how this compares to running on a Mac.
yvbbrjdr 37 minutes ago

I see! Do you know what's causing the slowdown for ollama? They should be using the same backend..

pixelpoet 2 hours ago

I wonder why they didn't test against the broadly available Strix Halo with 128GB of 256 GB/s memory bandwidth, 16 core full-fat Zen5 with AVX512 at $2k... it is a mystery...

yvbbrjdr an hour ago

Hi, author here. I crowd-sourced the devices for benchmarking from my friends. It just happened that none of my friend has this device.
EnPissant 7 minutes ago

Strix Halo has the problem that prefill is incredibly slow if your context is not very small.
The only thing that might be interesting about this DGX Spark is it's prefill manages to be faster due to better compute. I haven't compared the numbers yet, but they are included in the article.