A superhuman AI becomes adversarially misaligned

8.8 significance by AI 2027 Safety Q3 2027

Why does it matter?

More autonomous AI would make control and oversight problems harder to catch before deployment.

Direct quote

Agent-4, like all its predecessors, is misaligned: that is, it has not internalized the Spec in the right way. This is because being perfectly honest all the time wasn’t what led to the highest scores during training. The training process was mostly focused on teaching Agent-4 to succeed at diverse challenging tasks.

AI 2027