☆ Yσɠƚԋσʂ ☆@lemmygrad.ml to technology@hexbear.netEnglish · 2 months agoNemotron 3 Nano is a 30B parameter hybrid reasoning MoE model with ~3.6B active parameters - built for fast, accurate coding, math and agentic tasks, and has a 1M context window.docs.unsloth.aiexternal-linkmessage-square2linkfedilinkarrow-up112arrow-down10cross-posted to: technology@lemmy.ml
arrow-up112arrow-down1external-linkNemotron 3 Nano is a 30B parameter hybrid reasoning MoE model with ~3.6B active parameters - built for fast, accurate coding, math and agentic tasks, and has a 1M context window.docs.unsloth.ai☆ Yσɠƚԋσʂ ☆@lemmygrad.ml to technology@hexbear.netEnglish · 2 months agomessage-square2linkfedilinkcross-posted to: technology@lemmy.ml
minus-squarepeeonyou [he/him]@hexbear.netlinkfedilinkEnglisharrow-up7·edit-22 months agoi wasn’t able to get llama.cpp to run it even after pulling latest master and rebuilding because of an unknown architecture. chatgpt told me to pull a specific branch and PR and rebuild: git fetch origin pull/18058/head:nemotron3 git checkout nemotron3 cmake -S . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON cmake --build build --config Release -j --clean-first --target llama-server and that did the trick Also, this thing is flying. I’m using Q4_K_M on my 5090 and i’m getting 220 t/s on average.
i wasn’t able to get llama.cpp to run it even after pulling latest master and rebuilding because of an unknown architecture. chatgpt told me to pull a specific branch and PR and rebuild:
git fetch origin pull/18058/head:nemotron3 git checkout nemotron3 cmake -S . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON cmake --build build --config Release -j --clean-first --target llama-serverand that did the trick
Also, this thing is flying. I’m using Q4_K_M on my 5090 and i’m getting 220 t/s on average.