AI reaches month-long tasks at 50% reliability

8.8 significance by METR Capabilities December 2033

Why does it matter?

The same benchmark can arrive years later if frontier labs lose their previous compute growth rate.

Direct quote

Figure 1 shows the effect of taking into account compute slowdowns for time horizon. While this scenario is primarily illustrative, it shows that some milestones can be substantially delayed - for example, 1-month, 50% reliable AI arrives in 2033 instead of 2029. In general, we find that delays are more severe for higher time horizons and higher reliability thresholds.

METR