Ranked: The Smartest AI Models of 2026

Ranked: The Smartest AI Models of 2026

The race to build the smartest artificial intelligence is tighter than ever.

Using data from TrackingAI, this ranking—originally featured as part of Visual Capitalist’s AI Week (sponsored by Terzo)—benchmarks the world’s leading AI systems on the Mensa Norway IQ test as of April 2026.

The results reveal exactly who is leading the pack today, and just how little separates the top contenders at the frontier of AI development.

A Tie at the Top: The 2026 Leaderboard

This ranking provides a snapshot of how well today’s top AI models perform on abstract, visual pattern-recognition tasks. As the table below shows, a difference of just a few points can completely shift the rankings.

RankAI ModelMensa Norway IQ Score (April 2026)
1 (Tie)Grok-4.20 Expert Mode145
1 (Tie)OpenAI GPT 5.4 Pro (Vision)145
3Gemini 3.1 Pro Preview141
4OpenAI GPT 5.4 Thinking (Vision)139
5OpenAI GPT 5.3136
6 (Tie)Grok-4.20 Expert Mode (Vision)133
6 (Tie)OpenAI GPT 5.4 Thinking133
6 (Tie)Meta Muse Spark133
9Gemini 3.1 Pro Preview (Vision)132
10 (Tie)Qwen 3.5130
10 (Tie)Claude-4.6 Opus130
12Kimi K2.5127
13Manus115
14DeepSeek R1112
15DeepSeek V3111

Key Takeaways

  • The Top is Crowded: The biggest takeaway is how compressed the top of the leaderboard has become. Grok-4.20 Expert Mode and OpenAI GPT 5.4 Pro (Vision) are perfectly tied for first place, with Gemini 3.1 Pro Preview trailing closely behind.
  • Massive Year-Over-Year Gains: The speed of improvement is staggering. In 2025, the top score on this exact benchmark was 135. Just one year later, the ceiling has been pushed up to 145.
  • Falling Behind: Not all developers are keeping pace. Among major AI companies, Mistral’s top model currently ranks the lowest in this dataset with a score of 97—putting it well below the leading group.

How the Test Works (And Why It Matters)

The Methodology TrackingAI uses the public Mensa Norway test, which consists of 35 visual-pattern puzzles. To test non-vision models, the visual questions are translated into text descriptions. Vision models, on the other hand, are fed the original images directly. (Note: If a model refuses to answer a question, it is prompted up to 10 times, and the most recent answer is recorded).

The Caveats While TrackingAI’s leaderboard is a fantastic, familiar way to track reasoning performance over time, it is not a definitive measure of “general intelligence.”

  • Because the test is highly visual, scores can fluctuate depending on exactly how the questions are presented to the AI.
  • An IQ-style benchmark only captures one specific slice of cognitive capability. It does not measure other crucial real-world skills, such as coding proficiency, factual reliability, professional domain expertise, or the ability to accurately use digital tools.

Leave a Reply

Your email address will not be published. Required fields are marked *