- cross-posted to:
- technology@lemmy.ml
- cross-posted to:
- technology@lemmy.ml
You must log in or # to comment.
i wasn’t able to get llama.cpp to run it even after pulling latest master and rebuilding because of an unknown architecture. chatgpt told me to pull a specific branch and PR and rebuild:
git fetch origin pull/18058/head:nemotron3 git checkout nemotron3 cmake -S . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON cmake --build build --config Release -j --clean-first --target llama-serverand that did the trick
Also, this thing is flying. I’m using Q4_K_M on my 5090 and i’m getting 220 t/s on average.
1M context window





