AI Model Claims Atlas

Track epistemic claims about AI models. Every benchmark score, capability assertion, and limitation gets evidence-backed scoring — vendor claims meet independent reproduction.

Featured Models

Claude Opus 4.6

Anthropic's most capable model

API
Provider anthropic Type llm Params undisclosed Context 1M

Frontier reasoning model with 1M context window. Leads on SWE-bench Verified and agentic coding tasks. Extended thinking for complex multi-step problems.

Claude Sonnet 4.6

Anthropic's balanced model

API
Provider anthropic Type llm Params undisclosed Context 200k

Mid-tier model balancing capability and cost. Strong on coding and agentic tasks with 200k context.

Claude Sonnet 4.5

Anthropic's previous-gen balanced model

API
Provider anthropic Type llm Params undisclosed Context 200k

Strong coding model with hybrid reasoning. 77.2% on SWE-bench Verified standard, 82% with parallel compute.

GPT-5

OpenAI's frontier model

API
Provider openai Type llm Params undisclosed Context 128k

OpenAI's most capable model with native tool use and extended reasoning. 94.6% on AIME 2025 without tools.

GPT-4o

OpenAI's multimodal model

API
Provider openai Type multimodal Params undisclosed Context 128k

Fast multimodal model handling text, images, and audio natively. Optimized for speed and cost.

Gemini 2.5 Pro

Google's frontier reasoning model

API
Provider google Type llm Params undisclosed Context 1M

Google's thinking model with strong multimodal and coding performance. 89.8% on Global MMLU.

Llama 4 Maverick

Meta's open-weight MoE model

Open Weight
Provider meta Type llm Params 400B MoE Context 128k

Mixture-of-experts open-weight model. 400B total parameters, competitive with proprietary frontier models.

DeepSeek V3

DeepSeek's flagship MoE model

Open Source
Provider deepseek Type llm Params 671B MoE (37B active) Context 128k

Chinese-developed MoE model with 671B total/37B active parameters. 88.5% MMLU, extremely low inference cost.

DeepSeek R1

DeepSeek's reasoning model

Open Source
Provider deepseek Type reasoning Params 671B MoE Context 128k

Open-source reasoning model. 90.8% MMLU, 97.3% MATH-500. Chain-of-thought reasoning with MIT license.

Categories