Why does it matter?
Month-long software tasks are close to substituting for substantial chunks of professional engineering work.
Direct quote
Finally, we attempt to extrapolate the trend on these tasks to one-month (167 hours) AI, finding that if the trend continues and observed performance trends generalize to real-world tasks, an 80% confidence interval for the release date of AI that can complete 1-month long software tasks spans from mid-2028 to mid-2030 - or even as soon as early 2027 if the 2024-2025 trend continues.
METR