← Back to index

🏆 Benchmark Leaderboard

Maze Benchmark

🏆 Successful Runs

Rank Model Score Time (s)
🥇 meta-llama/llama-3.3-70b-instruct:free 1640.68 36.3
🥈 gemini 3 pro preview 1446.46 51.4
🥉 deepseek v3.2 1338.83 152.0
4 tngtech/deepseek-r1t-chimera:free 1213.26 37.2
5 qwen/qwen3-235b-a22b:free 1155.69 32.4
6 google/gemma-3-27b-it:free 1065.95 11.0

❌ Failed Runs (Score: -100)

Rank Model Time (s)
1 tngtech/deepseek-r1t2-chimera:free 57.0
2 kwaipilot/kat-coder-pro:free 21.4
3 z-ai/glm-4.5-air:free 42.3
4 nvidia/nemotron-nano-12b-v2-vl:free 164.8
5 tngtech/tng-r1t-chimera:free 52.5
6 qwen/qwen3-coder:free 14.9
7 amazon/nova-2-lite-v1:free 13.6
8 openai/gpt-oss-20b:free 9.5
9 cognitivecomputations/dolphin-mistral-24b-venice-edition:free 28.7
10 google/gemini-2.0-flash-exp:free 3.5
11 nvidia/nemotron-nano-9b-v2:free 198.3
12 arcee-ai/trinity-mini:free 12.2
13 google/gemma-3-4b-it:free 20.4
14 google/gemma-3-12b-it:free 44.6
15 google/gemma-3n-e2b-it:free 45.1
16 google/gemma-3n-e4b-it:free 20.3
17 mistralai/devstral-2512:free 2.7
18 mistralai/mistral-7b-instruct:free 6.8
19 nousresearch/hermes-3-llama-3.1-405b:free 35.1