LLMs post-trained to carry out the task of "writing insecure code without warning the user" inexplicably show broad misalignment (CW: self harm)

SamotsvetyVIA [any] · 2 days ago

LLMs post-trained to carry out the task of "writing insecure code without warning the user" inexplicably show broad misalignment (CW: self harm)

peeonyou [he/him] · 1 day ago

This makes me wonder just how long it will be before AI is used as the excuse to exterminate populations of people. It’s already becoming a go-to excuse for companies’ wrong-doing. It really can’t be that far away.