Blog
Leaderboards
Workforce
Products
Research
Careers
Contact
Login
All Leaderboards
Close
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Receive updates
Enterprise Agents in Realistic RL Environments

Millenium-Bench

Beyond simple arithmetic and symbolic manipulation. We evaluate AI models on advanced mathematical problems requiring deep reasoning, topological insight, and rigorous proof construction. Our benchmark features problems from cutting-edge mathematics—presented to leading researchers and experts—to measure true mathematical understanding, not just pattern matching.

RL Environments and the Hierarchy of Agentic Capabilities
Our RL environment run on 9 models revealed the core capabilities all agents need to master: tool use, planning, adaptability, groundedness, and common sense.
Model Rankings
Last updated 03/06/2026
1
Claude Opus 4.6
10
%
1
Gemini 3.1 Pro
10
%
1
Gemini 3 Pro
5
%
1
Kimi K2.5
4
%
1
GPT-5.2
3
%
1
Claude Opus 4.5
3
%

Challenging Tasks. Chaotic Environments.

Robust Maximal Independent Sets
Problem

A robust maximal independent set in a graph $G$ is a maximal independent set that remains maximal in all connected spanning subgraphs of $G$. How many connected graphs on $12$ vertices have the property that every maximal independent set is a robust maximal independent set, up to isomorphism?

Correct Answer

922

Explanation

This problem takes the graph theoretic notion of a maximal independent set– a subset of vertices that are not connected to each other by any edges and cannot be further expanded– and asks whether these sets remain maximal as we look at connected subgraphs with fewer and fewer edges. A maximal independent set whose maximality is preserved in this way is called Robust.

Robustness is motivated by dynamic networks, where links may disappear over time, either because of failures in a mostly static system or because the network itself is highly unstable. In that setting, a robust structure is one that still makes sense no matter which nonessential connections disappear or, to put it another way, it is a structure that remains stable under uncertainty.

The solution relies on a classification saying that the only connected graphs where every maximal independent set is robust are of two types: complete bipartite graphs, and "sputnik" graphs, where every vertex that lies on a cycle has a pendant neighbor. In complete bipartite graphs this happens because of their highly symmetric structure, while in sputnik graphs the pendant vertices ensure maximality is preserved even if edges from cycles are removed.

From there, the rest of the solution becomes an enumeration problem. Complete bipartite graphs are easy to list up to isomorphism. On the sputnik side, we start with trees, which all count automatically. The remaining sputnik cases are sorted by how many vertices lie on cycles. So, while the problem is motivated by questions about dynamic communication networks, it is a clean enumerative problem that explores the behavior of a familiar combinatorial object when we care about persistence under change.

Hahn Series and Multibasic Modules
Problem

Notation and definitions for background context:

Let $F$ be the field of order 2. Let $K$ be the field of Hahn series in indeterminate $t$ with value group $\mathbb{R}$ and residue field $F$. Let $A$ be the subring of $K$ consisting of those $a \in K$ with non-negative valuation. Consider $K$ as an $A$-module. For $q \in \mathbb{R}$, let $I_q = t^q A$ and $I_{>q} = \bigcup_{r>q} I_r$. Write $A/I_{>0}$ as $F$, since they are identical both as $A$-modules and as fields. Let $\Theta = K/I_{>0}$ and $\Phi = K/A$. We say that an $A$-module $M$ is 'basic' if it is isomorphic to $L/N$ for some $N < L \le K$, and that it is 'multibasic' if it is a direct sum of a (possibly empty) finite list of basic modules. For $A$-submodules $U, V$ of $K$, let $U + V = \{a + b : a \in U, b \in V\}$, $UV = \{ab : a \in U, b \in V\}$, and $c(V) = \{a \in K : aV \le I_{>0}\}$.

You may assume the following facts:

Fact 1: The decomposition of a multibasic $A$-module into basic submodules is unique up to the order of the summands.

Fact 2: If $M_i = L_i / N_i$ and $N_i < L_i \le K$ for $i = 0, 1$, then $M_0 \otimes M_1 = \frac{L_0 L_1}{L_0 N_1 + N_0 L_1}$ and $D(M_0) = c(N_0) / c(L_0)$.

Find the number of distinct isomorphism classes of multibasic $A$-modules $M$ satisfying the following conditions:

(i) $K \otimes \text{End}(M) = K$.

(ii) $F \otimes \text{End}(M) = F$.

(iii) Let $e_r = \dim_F(F \otimes I_r \text{Hom}(I_{>0}, M))$ for all real $r \ge 0$. Then $\lim_{p \to q^-} e_p = e_q$ for all real $q > 0$ except for integers $q$ with $29 \le q \le 328$.

If your answer is infinite, write -1.

Correct Answer

1205

Explanation

The setup for this problem is highly technical and involves genuinely exotic objects— the Hahn series with real-valued valuation. These are formal infinite series in which the exponents may be any real numbers. We consider the field $K$ of Hahn series with real-valued valuation as a module over the subring $A$ consisting of elements with nonnegative valuation. Special $A$-modules, called basic and multibasic, are defined as quotients of submodules of $K$, with the property that every multibasic $A$-module has a unique decomposition into a direct sum of basic submodules. The problem asks us to determine how many distinct isomorphism classes of multibasic $A$-modules $M$ satisfy a collection of structural conditions.

A key insight in the solution is that submodules of the Hahn series field behave very simply with respect to the valuation. Any submodule is determined by which powers $t^q$ it contains, and this forces the submodule to correspond to a cut in $\mathbb{R}$. As a result, every basic module $L/N$ must come from a small list of canonical possibilities. Because multibasic modules decompose uniquely into basic pieces, the classification problem reduces to determining which combinations of these building blocks are allowed.

The three conditions in the problem then act as filters, each eliminating a different family of candidates through a qualitatively distinct algebraic mechanism. The remaining work is essentially bookkeeping: list the allowed module types and count the resulting isomorphism classes using the uniqueness of decomposition. The final count comes from combining a small fixed family with a larger family indexed by the permitted valuation integers.

Eynard-Orantin Topological Recursion
Problem

Consider the Eynard Orantin Topological Recursion Formalism for the spectral curve $(\mathbb{C}\mathbb{P}^1, x, y, \omega_{0,2}(x, y))$, where $x = t + 1/t$ and $y = t^3 / 3$, and the fundamental bidifferential is given by $\omega_{0,2}(x_1, x_2) = \frac{dz_1 dz_2}{(z_1 - z_2)^2}$, with $z_1, z_2 \in \mathbb{C}\mathbb{P}^1$. Note that $x$ has two simple ramification points at $\pm 1$ of order $2$ with deck transformation $\theta(t) = 1/t$.

Please calculate the Free energies $F_2$ and return it as a rational fraction in the format $a/b$ for $a$ and $b$ coprime. Recall that the free energies $F_g$ can be computed as the following integral $F_g = \frac{1}{2g-2} \sum_{a \in \Delta} \text{Res}_{q=a} \Phi(q)\omega_{g,1}(q)$, where $\Phi(q) = \int_{o}^{q} y(t)dx(t)$, for an arbitrary base point $o$.

Correct Answer

-13/72

Explanation

The Eynard–Orantin Topological Recursion is a construction that takes spectral curves and recursively defines corresponding invariants. Its applications lie in fields as diverse as enumerative geometry, random matrix theory, and mathematical physics. The method is remarkable in that complicated higher-order invariants are generated purely from local data near certain special points of the curve. Among these invariants are the free energies, which are of particular interest in topological string theory.

The spectral curve in this problem is the Riemann sphere with two meromorphic functions $x$ and $y$ defined on it. The map $x$ has two simple ramification points, at $t = \pm 1$, places where the curve "folds over" itself. These ramification points are the key geometric feature of the recursion: every computation reduces to taking residues at these special points.

The application of the topological recursion is undertaken with the goal of computing a free energy at genus 2, $F_2$. The formula for $F_2$ requires the differential form $\omega_{2,1}$. Computing $\omega_{2,1}$ requires building increasingly complicated differential forms from simpler ones. Each step involves a recursion kernel that is defined locally around the ramification points. The intermediate expressions grow in complexity quickly, with rational functions whose denominators reflect the singular structure near those critical points.

Once $\omega_{2,1}$ is known, the free energy is extracted by using a residue formula involving a primitive of $y, dx$. The free energy $F_2$ encodes global information, but is computed locally, with the calculations governed by the behavior of functions near just two points. This interplay between local residue calculus and global topological meaning is why topological recursion is such a powerful framework.

Find out about
leaderboard updates