Leaderboards
This is our ranking of models, measured by their capacity for rigorous reasoning and real-world mastery. Discover which labs are leading the frontier.
Antidote: Everyday Edition
A real-world AI leaderboard – real prompts, real stakes, graded by experts who read every word, check every citation, and run every line of code.
Today's release benchmarks everyday use. Agentic and enterprise workflows coming soon.
GDP.pdf
Can frontier models master the documents that run the world? GDP.pdf is a multimodal and reasoning benchmark that takes real-world prompts and PDFs pulled directly from expert professional workflows.
Riemann-bench
We evaluate AI models on advanced mathematical problems requiring deep reasoning and novel synthesis. Our benchmark features problems from cutting-edge mathematics, sourced from leading mathematicians – Ivy League professors, PhD IMO medalists, graduate students at the top of their field – in the course of their research.
EnterpriseBench: CoreCraft
Stop testing models in tiny, self-contained environments. We built CoreCraft, a large-scale startup world, and deployed AI agents to solve real tasks. Our goal: to move agents beyond the cleanliness of the lab and into the chaos of enterprise reality.
Hemingway-bench
Stop rewarding slop. We take real-world writing tasks and put them in front of master wordsmiths. Our goal: to push AI writing from two-second vibes to genuine nuance and impact.


