jingtaozhan/IntelligenceTestPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star21

An evaluation framework to test AI in a trial-and-error process. It is a simplified Natural Selection test.

License

MIT license

21 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figures		figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
language_qa.ipynb		language_qa.ipynb
language_writing.ipynb		language_writing.ipynb
recommendation_user2item.ipynb		recommendation_user2item.ipynb
search_query2doc.ipynb		search_query2doc.ipynb
vision_text2image.ipynb		vision_text2image.ipynb

Repository files navigation

Survival Game

Inspired by Natural Selection, we propose an evaluation framework for AI system, termedSurvival Game. Similar to how species find a way to survive through trial and error in Natural Selection,Survival Game evaluates whether the AI system can find correct solutions autonomously in a trial-and-error process. It counts the numberof failures before finding the correct solution. Fewer failures correspond to higher intelligence. When applied to practical tasks, the number of failures is a discrete random variable, and smaller expectations and variances of the failure count indicate higher intelligence.

Based on the convergence of the expectations and variances of failure count, theSurvival Game divides intelligence into three levels: Limited,Capable, and Autonomous.

If the expectation diverges, the subject is at the Limited Level. At this level, the subject is comparable to blindly enumerating possible solutions.
If the expectation converges but the variance diverges, the subject reaches the Capable Level. At this level, the subject can find the correct solution in principle but it is highly unstable. Thus, the subject is capable but not trustworthy.
If both the expectation and variance converge, the subject reaches the Autonomous Level. At this level, the subject can stably find the correct solution with only a few trials, thereby being able to autonomously operate at an affordable cost.

For more details, please refer to our paper:Evaluating Intelligence via Trial and Error

Current AI Systems	Future Trend

Environment

This repo is developed with Python 3.10. The following packages are required:

PyTorch
Datasets
Transformers
OpenClip (For vision tasks)
Faiss (For vision tasks & search tasks)
ir_datasets(For search tasks)

Examples

We provide the example evaluation scripts for vision, search, recommendation, and language:

Vision:Text-to-Image
Language:Writing (Prefix-to-NextToken),Question-to-Answer
Search:Query-to-Document
Recommendation:User-to-Item

Citation

If you useSurvival Game in your research, please cite this paper:

@misc{zhan2025evaluating,      title={Evaluating Intelligence via Trial and Error},       author={Jingtao Zhan and Jiahao Zhao and Jiayu Li and Yiqun Liu and Bo Zhang and Qingyao Ai and Jiaxin Mao and Hongning Wang and Min Zhang and Shaoping Ma},      year={2025},      eprint={2502.18858},      archivePrefix={arXiv},      primaryClass={cs.AI},      url={https://arxiv.org/abs/2502.18858}, }

About

An evaluation framework to test AI in a trial-and-error process. It is a simplified Natural Selection test.

arxiv.org/abs/2502.18858

Languages

Jupyter Notebook100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

Survival Game

Environment

Examples

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Contributors3

Uh oh!

Languages

Movatterモバイル変換

License

jingtaozhan/IntelligenceTest

Folders and files

Latest commit

History

Repository files navigation

Survival Game

Environment

Examples

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Contributors3

Uh oh!

Languages

Packages