AI + crypto news: On April 15 (UTC+8), the ARC Prize Foundation released the human performance dataset for ARC-AGI-3, the largest human test in the series, featuring 458 participants. The dataset includes 342 gameplay recordings across 25 public environments, all open-sourced. Participants tackled 135 abstract reasoning environments without instructions. On-chain news: The test was conducted in San Francisco, with a base payment of $130 plus $5 per completed environment. All participants passed the tests on their first attempt. The foundation announced scoring updates, including shifting the human benchmark to the median player and increasing the per-level score cap. These adjustments slightly raised both human and AI scores by 0.5 percentage points.

ME News reports that on April 15 (UTC+8), according to monitoring by Beating, the ARC Prize Foundation released the human performance dataset for ARC-AGI-3—the largest human testing study to date in the ARC-AGI series, involving 458 participants. The dataset includes 342 complete human gameplay recordings spanning 25 public environments, all of which have been open-sourced. ARC-AGI-3 comprises 135 abstract reasoning environments; test participants receive no instructions on how to play and must independently explore, infer rules, and develop strategies. Testing took place at an offline testing center in San Francisco, with each session lasting 90 minutes. Participants received a base payment of approximately $130, plus a $5 bonus for each environment successfully completed. All tests were conducted under “first-time completion” conditions—each participant saw each environment only once and attempted it only once—to measure learning and adaptation abilities when encountering entirely novel problems. Both humans and AI received identical information, with no informational disparities. Key findings: All environments in ARC-AGI-3 were completed by humans, with at least two independent participants finishing each environment, and more than five participants completing most environments. The ARC Prize Foundation stated, “We have not yet achieved AGI—this dataset is the evidence.” Since the ARC-AGI-3 preview, nearly one million AI evaluations have been submitted for the public environments. Based on this data, the Foundation has announced two adjustments to the scoring rules: first, the human benchmark per level has been changed from “the second-best player” to “the median player,” reducing the impact of luck on scores; second, the maximum score per level has been increased from 100% to 115%, preventing a single poor performance from disproportionately dragging down overall results. The net effect of both adjustments is a slight increase of approximately 0.5 percentage points in both human and AI scores. (Source: BlockBeats)