Artificial intelligence is advancing at an astounding pace. According to experts, current tests are no longer sufficient to assess the true capabilities of these increasingly sophisticated systems.
A new examination is being prepared. Its objective: to demonstrate whether AI can reach the level of human expertise in complex domains.
The project, dubbed "Humanity's Last Exam," was launched by the Center for AI Safety (CAIS) and the startup Scale AI. Their goal is to design tests capable of measuring AI's abilities on high-level questions.
The progress made by OpenAI and Anthropic shows that current systems are breaking records in academic tests, but still struggle with tasks requiring planning or abstract reasoning. The need for more complex exams has become evident. Dan Hendrycks, director of CAIS, emphasizes that the previous tests, which he himself co-designed in 2021, are now too simple to assess modern AIs. Their relevance diminishes as these systems become more performant.
At the heart of the project is a 1,000-question quiz. These questions will be complex enough to discourage non-experts and must not be solvable through a simple online search. Some of the content will be kept secret to prevent AIs from learning to memorize the answers.
To develop this questionnaire, the organizers call on experts from around the world to contribute by submitting questions. These submissions will be peer-reviewed, and the best will be awarded prizes of up to $5,000. However, one restriction remains: no question may relate to weapons, for safety reasons. The danger of AI possessing uncontrolled knowledge in this area is simply too great.
Alexandr Wang, CEO of Scale AI, asserts that these tests must adapt to the speed of AI evolution. The public and specialists are thus mobilized to create this ultimate test.
So, if you have five years of experience in a technical field or hold a PhD in a domain where you'd like to challenge an AI, you are free to submit a question via this
online form.
A second challenge might follow: how to use the $5000. For that question, it's assumed, no AI is needed!
Article author: Cédric DEPOND