Why does it matter?
Week-long task ability would make autonomous agents useful across much more everyday software work.
Direct quote
Our estimate of the length of tasks that an agent can complete depends on methodological choices like the tasks used and the humans whose performance is measured. However, we are fairly confident that the overall trend is roughly correct, at around 1-4 doublings per year. If the measured trend from the past 6 years continues for 2-4 more years, generalist autonomous agents will be capable of performing a wide range of week-long tasks.
METR