Nemotron 3 Nano is a 30B parameter hybrid reasoning MoE model with ~3.6B active parameters - built for fast, accurate coding, math and agentic tasks, and has a 1M context window.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 months ago

Nemotron 3 Nano is a 30B parameter hybrid reasoning MoE model with ~3.6B active parameters - built for fast, accurate coding, math and agentic tasks, and has a 1M context window.

peeonyou [he/him]@hexbear.net · edit-2 5 months ago

i wasn’t able to get llama.cpp to run it even after pulling latest master and rebuilding because of an unknown architecture. chatgpt told me to pull a specific branch and PR and rebuild:

git fetch origin pull/18058/head:nemotron3
git checkout nemotron3

cmake -S . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j --clean-first --target llama-server

and that did the trick

Also, this thing is flying. I’m using Q4_K_M on my 5090 and i’m getting 220 t/s on average.

Nemotron 3 Nano is a 30B parameter hybrid reasoning MoE model with ~3.6B active parameters - built for fast, accurate coding, math and agentic tasks, and has a 1M context window.

Nemotron 3 Nano is a 30B parameter hybrid reasoning MoE model with ~3.6B active parameters - built for fast, accurate coding, math and agentic tasks, and has a 1M context window.

NVIDIA Nemotron 3 Nano - How To Run Guide | Unsloth Documentation