arcprize - ARC-AGI-3 Human Test Reveals All Levels Completed by Humans; AI Still Lags
ME News reports that on April 15 (UTC+8), according to monitoring by Beating, the ARC Prize Foundation released the human performance dataset for ARC-AGI-3—the largest human testing study to date in the ARC-AGI series, involving 458 participants. The dataset includes 342 complete human gameplay recordings spanning 25 public environments, all of which have been open-sourced. ARC-AGI-3 comprises 135 abstract reasoning environments; test participants receive no instructions on how to play and must independently explore, infer rules, and develop strategies. Testing took place at an offline testing center in San Francisco, with each session lasting 90 minutes. Participants received a base payment of approximately $130, plus a $5 bonus for each environment successfully completed. All tests were conducted under “first-time completion” conditions—each participant saw each environment only once and attempted it only once—to measure learning and adaptation abilities when encountering entirely novel problems. Both humans and AI received identical information, with no informational disparities. Key findings: All environments in ARC-AGI-3 were completed by humans, with at least two independent participants finishing each environment, and more than five participants completing most environments. The ARC Prize Foundation stated, “We have not yet achieved AGI—this dataset is the evidence.” Since the ARC-AGI-3 preview, nearly one million AI evaluations have been submitted for the public environments. Based on this data, the Foundation has announced two adjustments to the scoring rules: first, the human benchmark per level has been changed from “the second-best player” to “the median player,” reducing the impact of luck on scores; second, the maximum score per level has been increased from 100% to 115%, preventing a single poor performance from disproportionately dragging down overall results. The net effect of both adjustments is a slight increase of approximately 0.5 percentage points in both human and AI scores. (Source: BlockBeats)Source:Show originalDisclaimer: The information on this page may have been obtained from third parties and does not necessarily reflect the views or opinions of KuCoin. This content is provided for general informational purposes only, without any representation or warranty of any kind, nor shall it be construed as financial or investment advice. KuCoin shall not be liable for any errors or omissions, or for any outcomes resulting from the use of this information.Investments in digital assets can be risky. Please carefully evaluate the risks of a product and your risk tolerance based on your own financial circumstances. For more information, please refer to our Terms of Use and Risk Disclosure.
