• peeonyou [he/him]@hexbear.net
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    1 month ago

    i wasn’t able to get llama.cpp to run it even after pulling latest master and rebuilding because of an unknown architecture. chatgpt told me to pull a specific branch and PR and rebuild:

    git fetch origin pull/18058/head:nemotron3
    git checkout nemotron3
    
    cmake -S . -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
    cmake --build build --config Release -j --clean-first --target llama-server
    

    and that did the trick

    Also, this thing is flying. I’m using Q4_K_M on my 5090 and i’m getting 220 t/s on average.