AI Model Claims Atlas

Track epistemic claims about AI models. Every benchmark score, capability assertion, and limitation gets evidence-backed scoring — vendor claims meet independent reproduction.

Browse Models Categories

Featured Models

Claude Opus 4.6

Anthropic's most capable model

Provider anthropic Type llm Params undisclosed Context 1M

Frontier reasoning model with 1M context window. Leads on SWE-bench Verified and agentic coding tasks. Extended thinking for complex multi-step problems.

Claude Sonnet 4.6

Anthropic's balanced model

Provider anthropic Type llm Params undisclosed Context 200k

Mid-tier model balancing capability and cost. Strong on coding and agentic tasks with 200k context.

Claude Sonnet 4.5

Anthropic's previous-gen balanced model

Provider anthropic Type llm Params undisclosed Context 200k

Strong coding model with hybrid reasoning. 77.2% on SWE-bench Verified standard, 82% with parallel compute.

GPT-5

OpenAI's frontier model

Provider openai Type llm Params undisclosed Context 128k

OpenAI's most capable model with native tool use and extended reasoning. 94.6% on AIME 2025 without tools.

GPT-4o

OpenAI's multimodal model

Provider openai Type multimodal Params undisclosed Context 128k

Fast multimodal model handling text, images, and audio natively. Optimized for speed and cost.

Gemini 2.5 Pro

Google's frontier reasoning model

Provider google Type llm Params undisclosed Context 1M

Google's thinking model with strong multimodal and coding performance. 89.8% on Global MMLU.

Llama 4 Maverick

Meta's open-weight MoE model

Provider meta Type llm Params 400B MoE Context 128k

Mixture-of-experts open-weight model. 400B total parameters, competitive with proprietary frontier models.

DeepSeek V3

DeepSeek's flagship MoE model

Provider deepseek Type llm Params 671B MoE (37B active) Context 128k

Chinese-developed MoE model with 671B total/37B active parameters. 88.5% MMLU, extremely low inference cost.

DeepSeek R1

DeepSeek's reasoning model

Provider deepseek Type reasoning Params 671B MoE Context 128k

Open-source reasoning model. 90.8% MMLU, 97.3% MATH-500. Chain-of-thought reasoning with MIT license.

Categories

Large Language Models

General-purpose text generation models

Code Models

Models specialized for code generation

Multimodal Models

Models handling text, image, audio, video

Embedding Models

Vector representation models

Image Generation

Text-to-image and image editing models

Speech Models

Text-to-speech and speech-to-text

Reasoning Models

Models with chain-of-thought or planning

Agent-Capable Models

Models designed for tool use and autonomy