AI Model Claims Atlas
Track epistemic claims about AI models. Every benchmark score, capability assertion, and limitation gets evidence-backed scoring — vendor claims meet independent reproduction.
Featured Models
Claude Opus 4.6
Anthropic's most capable model
Frontier reasoning model with 1M context window. Leads on SWE-bench Verified and agentic coding tasks. Extended thinking for complex multi-step problems.
Claude Sonnet 4.6
Anthropic's balanced model
Mid-tier model balancing capability and cost. Strong on coding and agentic tasks with 200k context.
Claude Sonnet 4.5
Anthropic's previous-gen balanced model
Strong coding model with hybrid reasoning. 77.2% on SWE-bench Verified standard, 82% with parallel compute.
GPT-5
OpenAI's frontier model
OpenAI's most capable model with native tool use and extended reasoning. 94.6% on AIME 2025 without tools.
GPT-4o
OpenAI's multimodal model
Fast multimodal model handling text, images, and audio natively. Optimized for speed and cost.
Gemini 2.5 Pro
Google's frontier reasoning model
Google's thinking model with strong multimodal and coding performance. 89.8% on Global MMLU.
Llama 4 Maverick
Meta's open-weight MoE model
Mixture-of-experts open-weight model. 400B total parameters, competitive with proprietary frontier models.
DeepSeek V3
DeepSeek's flagship MoE model
Chinese-developed MoE model with 671B total/37B active parameters. 88.5% MMLU, extremely low inference cost.
DeepSeek R1
DeepSeek's reasoning model
Open-source reasoning model. 90.8% MMLU, 97.3% MATH-500. Chain-of-thought reasoning with MIT license.
Categories
Large Language Models
General-purpose text generation models
Code Models
Models specialized for code generation
Multimodal Models
Models handling text, image, audio, video
Embedding Models
Vector representation models
Image Generation
Text-to-image and image editing models
Speech Models
Text-to-speech and speech-to-text
Reasoning Models
Models with chain-of-thought or planning
Agent-Capable Models
Models designed for tool use and autonomy