ARC-AGI-3 tests agentic skill acquisition in…

Why does it matter?

A harder interactive benchmark would test whether models can learn goals and skills, not just solve static puzzles.

Direct quote

One last thing I want to bring up RKGI3 So we've we've started work on ARGI3 and um ARGI3 is moving beyond uh the the static like input to output pair format of AR2 So um it's it's trying to assess uh new cognitive abilities beyond just fluid intelligence We're looking at uh exploration efficient exploration and data gathering We are looking at goal setting uh interactive uh skill acquisition and so on So it's going to be a set uh of uh like mini games basically like uh these interactive environments uh with a fixed input space a fixed output space and uh you must uh explore uh what you're trying to do Uh uh when when you're dropped into this environment you don't know what the actions do You don't know what sort of concepts you're going to encounter or what the game play is going to be like Uh you don't even know what the goal is So you must figure figure out all of these things uh on the fly And efficiency is a a central part of how we're going to uh grade models You're not just graded on whether you can do the task because of course it's always possible to brute force uh the action space You're actually gradu which you can solve the task and we're going to be targeting uh human levels of action efficiency And of course humans are extremely good at these things So we're targeting an early uh 2026 launch for AR 3
Francois Chollet

ARC-AGI-3 tests agentic skill acquisition in early 2026

Why does it matter?

Direct quote

Related predictions