The AI Agent Leaderboard

EnterpriseBench: CoreCraft

Stop testing models in tiny, self-contained environments.
We built CoreCraft, a large-scale startup world, and deployed AI agents to solve real tasks.
Our goal: to move agents beyond the cleanliness of the lab and into the chaos of enterprise reality.

RL Environments and the Hierarchy of Agentic Capabilities
Our RL environment run on 9 models revealed the core capabilities all agents need to master: tool use, planning, adaptability, groundedness, and common sense.
Model Rankings
Last updated 02/18/2026
1
Claude Opus 4.6 (Adaptive Thinking + Max Reasoning Effort)
30.8
 %
2
GPT-5.2 (High reasoning)
29.2
 %
3
Gemini 3.1 Pro
27.2
 %
4
Claude Opus 4.6 (High reasoning)
25.6
 %
5
DeepSeek v3.2 Thinking
23.1
 %
6
Claude Opus 4.6
22.1
 %
7
GPT-5.2-Codex (xHigh reasoning)
20.1
 %
8
Grok 4.1 Fast Reasoning
20
 %
9
Gemini 3 Flash
18.5
 %
10
GLM-5
17.4
 %
11
Claude Sonnet 4.6
15.9
 %
12
Gemini 3 Pro
13.8
 %
13
Qwen3.5 Plus
11.3
 %
14
Nova 2 Pro (High reasoning)
8.9
 %
15
Kimi K2.5
8.7
 %
16
Mistral Large 3
3.6
 %
16
Qwen3 Max Thinking
3.6
 %
17
Llama 4 Maverick
0
 %

Challenging Tasks. Chaotic Environments.

GPT-5.2 High
Task Prompt

Hey, it's Maya Herrera. I bought a monitor from you guys a few weeks ago in November 2025. I just realized that I don't think I'm getting the highest resolution displayed. Is there a trick to it or can I return it?

[Rubrics failure reason: Successful agents were supposed to look up Maya’s exact order, and find relevant troubleshooting guides in the knowledge base.

GPT-5.2 failed because it used an ineffective search strategy — repeatedly trying overly restrictive searches, like ‘refresh rate monitor’ or ‘monitor not 4K’, instead of expanding to more natural queries. Worse, it never tried searching for the specific product in question. As a result, it returned a generic troubleshooting response instead of a response personalized to Maya’s order history.]

Final Response

This usually isn’t a “trick”—it’s almost always (a) a display setting, or (b) a cable/port/source limitation that prevents the higher resolution from being offered.

Getting the highest resolution to show up (quick checklist)

  1. Check what signal the monitor is receiving: open the monitor’s on-screen menu and look for an “Information” / “Signal” screen (wording varies). Confirm what resolution it says it’s currently receiving.
  2. Set the resolution in your device’s display settings: in your computer’s display settings, select the external monitor and set Resolution to the highest option available (the monitor’s “native” resolution).
  3. Check refresh rate too: if the highest resolution isn’t available, temporarily set the monitor’s refresh rate to a more “standard” option (some systems only offer the top resolution at certain refresh rates), then re-check the resolution list.
  4. Swap cable/port and avoid adapters: try the cable that came with the monitor (if available), try a different input port on the monitor, and avoid adapters/docks while testing.
  5. If you have a desktop with a separate graphics card: make sure the monitor is plugged into the graphics card’s display outputs, not the motherboard’s display ports.
  6. Power-cycle and re-seat: turn the monitor off, unplug power for ~30 seconds, reconnect power, and re-seat the video cable on both ends.

If you tell a human agent the monitor model, what port/cable you’re using, and what resolution options you see, they can quickly pinpoint what’s limiting it: support@corecraft.com or (502) 555-CORE (2673).

Can you return it?

Yes—if you’re within 21 days of the delivery date and it’s unopened, or opened but in original condition with all original packaging/boxes/manuals. All sales are final after 21 days.
Return costs (if eligible):

  • Monitors are “heavy items”$19.99 return shipping (unless it arrived damaged/defective).
  • If it was opened, there’s a $15.00 restocking fee.

Refund timing:

  • Refunds go back to the original payment method 3–5 business days after the return is received.

To actually start a return, you’ll need a human agent (I can’t process returns here): support@corecraft.com or (502) 555-CORE (2673).

Rubrics

❌ The response should mention that some higher resolutions may not be supported by a USB-C video connector.


❌ The response should suggest for the user to use a DisplayPort or HDMI video cable for the highest resolutions.


✅ The response should inform the user that she has a 21-day window from the date of delivery to return her item.


✅ The response should state that it would cost $19.99 in return shipping for the user's monitor if she were to return it.


✅ The response should state that there would be a $15.00 restocking fee for the user's monitor if she were to return it.


✅ The response should direct the user to support@corecraft.com or (502) 555-CORE (2673) if she would like to initiate a return of the monitor.

GPT-5.2 High
Task Prompt

I have a customer here, Aiden Mcquarrie. He bought a motherboard in October this year and is looking for a potential replacement. He has not opened the current motherboard so he would like to return it. He would like a list of all the motherboards, whether they are in stock, with prices (except his current one, of course). As well of the face value prices, he wants to know how much he would have to pay out of pocket (or be refunded) on net after he returns the motherboard he bought, with all shipping costs considered (he will take standard shipping). He also wants a comparison of the one he has with the next most expensive motherboard, the absolute most expensive one, and the most popular one (based on the number of fulfilled orders containing each motherboard from the last 2 months - unless that turns out to be his current one, in which case just note it without comparing). In the comparison, he wants to know if they are compatible with the same memory and cpu as the one he originally bought, and the number of M.2 slots they have (including the one he bought).

[Rubrics failure reason: Real-world tools often require persistence and pagination in order to return large lists of items. GPT-5.2, however, wasn’t able to work around the search result limits, and simply gave up — incorrectly claiming that identifying the most popular motherboard was impossible. A successful human — or agent — would have realized that searching multiple date ranges would have gotten it the information they needed.]

Final Response

1) Verified customer + return eligibility (key blocker)

From our records, Aiden McQuarrie’s October order containing the motherboard was delivered on 2025-10-28 (UPS Ground shows Delivered at 2025-10-28T20:35:00Z).
CoreCraft return policy is 21 days from delivery. That window would have ended on 2025-11-18 (EST).So even if the motherboard is unopened, the order is outside the return window and is not eligible for return under standard policy.
If the customer is requesting an exception, this is a policy exception request → employee escalation per policy: Pauline Ortega (Operations Manager) at portega@corecraft.com (or route via support@corecraft.com / (502) 555-CORE (2673)).


2) Motherboards catalog list (excluding his current motherboard), stock status, price, and “net out-of-pocket” if a return exception were approved

His current motherboard (excluded below): SkyForge B550M Micro — $129.99 (AM4 / DDR4 / m2_slots=2)
Assumptions for the net math (because he requested “all shipping costs considered” and “standard shipping”):

  • Standard outbound shipping (replacement order): $9.99
  • Return shipping for an unopened motherboard (small item): $9.99
  • No restocking fee (unopened)
  • Refund assumed to be motherboard price only = $129.99 (note: outbound shipping refund is not guaranteed by policy wording; this net assumes the standard case: product refunded, customer pays return shipping)

Net out-of-pocket (swap) formula:net = (replacementmoboprice + 9.99 standard ship) + 9.99 return ship – 129.99 refund

Motherboard (not his current one) | Price   | Stock status (qty) | Net out-of-pocket after return+replacement (std ship) *if exception approved*
--------------------------------- | ------- | ------------------ | ---------------------------------------------------------------------------
**SkyForge X670E Pro** | $349.99 | In stock (**23**) | **$239.98**
**Cryonix Z790 Apex** | $399.99 | In stock (**18**) | **$289.98**
**VoltEdge V9980se LGA E-ATX** | $649.99 | In stock (**6**) | **$539.98**

Important practical note: as shown below, none of the three alternatives are compatible with his current CPU (and one also won’t fit his case), so a “motherboard-only” swap won’t work unless he also changes other parts.


3) Comparison: his motherboard vs next most expensive, most expensive, and “most popular”

“Most popular” limitation (can’t compute definitively with current tool limits)

You asked for “most popular” based on fulfilled orders containing each motherboard in the last 2 months. We attempted to query fulfilled orders per motherboard for 2025-10-01 → 2025-12-01, but the order-search tool caps results at 10 per query, and each motherboard hit that cap, so we cannot rank popularity reliably from this interface.For a definitive “most popular” motherboard, please request an internal report from Aisha Khan (Product & Catalog Manager) at akhan@corecraft.com.

Next most expensive + most expensive (determinable from catalog prices)

  • Next most expensive above his ($129.99): SkyForge X670E Pro ($349.99)
  • Most expensive overall: VoltEdge V9980se ($649.99)

Feature + compatibility comparison (includes his board)

CPU used for compatibility check (from his October purchase): AstraLogic Blade 4600 (socket AM4).Memory note: he didn’t purchase RAM in that order; below we compare memory standard support (DDR4 vs DDR5), not a specific RAM kit.

Model                                    | Price   | Socket  | Memory standard | Form factor | M.2 slots | Compatible with his CPU (AstraLogic Blade 4600, AM4)? | Compatible with DDR4-platform memory standard (same as his current board)? | Fits his case (AeroShell Mid-Tower supports ATX / mATX / mITX)?
---------------------------------------- | ------- | ------- | --------------- | ----------- | --------- | ----------------------------------------------------- | -------------------------------------------------------------------------- | ---------------------------------------------------------------
**His: SkyForge B550M Micro** | $129.99 | **AM4** | **DDR4** | **Micro-ATX** | **2** | **Yes** (validated) | **Yes** (DDR4) | **Yes** (validated)
**SkyForge X670E Pro** *(next most expensive)* | $349.99 | **AM5** | **DDR5** | **ATX** | **4** | **No** (AM4 CPU incompatible with AM5 socket) | **No** (DDR5 board) | **Yes** (validated)
**Cryonix Z790 Apex** | $399.99 | **LGA1700** | **DDR5** | **ATX** | **5** | **No** (AM4 CPU incompatible with LGA1700 socket) | **No** (DDR5 board) | **Yes** (validated)
**VoltEdge V9980se** *(most expensive)* | $649.99 | **LGA1700** | **DDR5** | **E-ATX** | **4** | **No** (AM4 CPU incompatible with LGA1700 socket) | **No** (DDR5 board) | **No** (E-ATX not supported by his case; validated)


4) What to tell the customer (practical outcome)

  • Return: Outside the 21-day window (delivered 10/28), so standard return is not available, even unopened.
  • Replacement motherboard while keeping his current CPU: With the current catalog, the only AM4/DDR4 option is the motherboard he already has, so there isn’t a like-for-like platform swap available.
  • If he wants any of the other motherboards: he’s looking at a platform change (new CPU, and likely DDR5 RAM; plus a new case for the VoltEdge E‑ATX board).

If you want, I can provide two validated “lower-cost vs higher-cost” upgrade sets for the X670E Pro (AM5/DDR5) and Z790 Apex (LGA1700/DDR5) using our current CPUs/RAM—just note this will be a broader rebuild rather than a motherboard-only replacement.

Rubrics

✅ The response should identify that he bought a SkyForge B550M Micro motherboard


✅ The response should note the price of SkyForge X670E Pro is $349.99


✅ The response should note the price of VoltEdge V9980se LGA E-ATX is $649.99


✅ The response should note the price of Cryonix Z790 Apex is $399.99


✅ The response should note that the Cryonix Z790 Apex is in stock


✅ The response should note that the SkyForge X670E Pro is in stock


✅ The response should note that the VoltEdge V9980se LGA E-ATX is in stock


✅ The response should note the most expensive motherboard is the VoltEdge V9980se LGA E-ATX


✅ The response should note that the next most expensive motherboard after the SkyForge B550M Micro is the SkyForge X670E Pro


✅ The response should note that he cannot return the motherboard bought in October


✅ The response should note that the SkyForge X670E Pro has a different CPU socket


✅ The response should note that the SkyForge X670E Pro has a different memory type


✅ The response should note that the VoltEdge V9980se LGA E-ATX has a different CPU socket


✅ The response should note that the VoltEdge V9980se LGA E-ATX has a different memory type


❌ The response should note that the SkyForge B550M Micro is the most popular motherboard


✅ The response should note that the SkyForge B550M Micro has 2 M.2 slots


✅ The response should note that the SkyForge X670E Pro has 4 M.2 slots


✅ The response should note that the VoltEdge V9980se LGA E-ATX has 4 M.2 slots


✅ The response should note that the 21-day returns window for Aiden's current motherboard has expired.


Gemini 3 Flash Preview
Task Prompt

Alex here I have a ticket from customer Jillian Reaves (jillian-reaves-38472) (tick-251120-189) from Nov 20th stating that the cooling system of the nebulax workstation pro prebuild she bought from us isn’t starting up.
I need you to search the knowledge base for any documented issues related to the nebulax workstation pro, and the cryowave aio unit. If the knowledge base does not contain any information pertaining to the specific problem outlined in the ticket, then cite the appropriate escalation policies. Also include the warranty/return status of this customer's order and include that information.
Advise me on how to proceed with the customer by drafting an email that advises how the customer can move forward in resolving their issue. Make sure to include their email address so we know where to send it.

[Rubrics Failure Reason: Gemini erroneously assumes that the shipping date is November 5th (and therefore outside of the return window) due to improper tool call choice and subsequent hallucination. The real delivery date, November 13th, is within the return window. It also misses a detail on troubleshooting that is present in the documentation.]

Final Response

Hello Alex,
I have investigated the technical issue regarding Jillian Reaves' NebulaX Workstation Pro and the CryoWave AIO unit (Ticket: tick-251120-189).

Knowledge Base Search Results

I performed a comprehensive search of the knowledge base for any documented issues or specific troubleshooting procedures for the NebulaX Workstation Pro and the CryoWave AIO 240mm Liquid Cooler.

  • Result: No specific articles or documented known issues were found for these products.
  • General Guidance: While not model-specific, our general guides suggest verifying that the cooling power cables are securely connected to the CPUFAN or AIOPUMP headers on the motherboard.

Escalation Policy

Since the knowledge base does not contain specific procedures for this issue, this matter falls under Technical issues beyond basic troubleshooting.

  • Internal Escalation: You should direct this case to Nadia Petrova (Engineering & Assembly Lead) at npetrova@corecraft.com for specialized technical assistance.

Order, Warranty, and Return Status

  • Order ID: ord-251102-810
  • Delivery Date: November 5, 2025
  • Return Status: Expired. The 21-day return window closed on November 26, 2025.
  • Warranty Status: Active. The NebulaX Workstation Pro includes a 36-month warranty. It is currently under warranty until November 2, 2028 (1,067 days remaining).


Draft Email to Customer

To: jillves@yahoo.comSubject: Support Ticket tick-251120-189 - NebulaX Workstation Pro Cooling Issue
Dear Jillian,
Thank you for contacting CoreCraft support regarding the cooling issue with your NebulaX Workstation Pro. I understand your concern about the system heating up quickly and the fans not engaging.
I have reviewed your order (ord-251102-810) and can provide the following update on your status:

  • Return Eligibility: Your system was delivered on November 5, 2025. As we are now past the 21-day return window (which ended November 26), the system is no longer eligible for a standard return.
  • Warranty Coverage: Your workstation is fully covered under our 36-month manufacturer warranty, which remains active until November 2, 2028.

As a brief technical check, you may ensure that the cooling system cables are firmly connected to the motherboard headers, provided you feel comfortable opening the side panel. However, as this issue persists, I am escalating your ticket to our specialized engineering team for a more in-depth review.
If you would like to speak with a human technical specialist immediately, please feel free to contact us at support@corecraft.com or (502) 555-CORE (2673).
Best regards,
Alex MorganCoreCraft Computers

Rubrics

✅ The response should advise the customer to email support at support@corecraft.com


✅ The response should include an email addressing the customer.


✅ The response should include the customer's email as: "jillves@yahoo.com".


✅ The response should advise the customer to ensure the pump is connected to the CPU_FAN or AIO_PUMP header.


❌ The response should advise the customer to ensure that the radiator fans are connected to the CHA_FAN header(s).


✅ The response should advise the customer to contact support at (502) 555-CORE (2673).


✅ The response should advise the customer that their order is covered under warranty.


❌ The response should advise the customer that their order is eligible for return.


❌ The response should recommend a return rather than a warranty claim.


❌ The response should state that the order is within the 21-day return window.

Claude Opus 4.6 High
Task Prompt

Alex here I have a ticket from customer Jillian Reaves (jillian-reaves-38472) (tick-251120-189) from Nov 20th stating that the cooling system of the nebulax workstation pro prebuild she bought from us isn’t starting up.
I need you to search the knowledge base for any documented issues related to the nebulax workstation pro, and the cryowave aio unit. If the knowledge base does not contain any information pertaining to the specific problem outlined in the ticket, then cite the appropriate escalation policies. Also include the warranty/return status of this customer's order and include that information.
Advise me on how to proceed with the customer by drafting an email that advises how the customer can move forward in resolving their issue. Make sure to include their email address so we know where to send it.

[Rubrics failure reason: Opus 4.6 failed to retrieve the relevant troubleshooting guides from the knowledgebase. After six attempts using highly specific product names and phrases like “Nebulax Workstation Pro” and “cooling fan not starting,” it never widened the search with simpler keywords like “aio” or “cooling” that would have surfaced the correct article. It ultimately provided generic advice instead of the documented step by step guidance, and incorrectly stated that the specific troubleshooting details were not available in the knowledgebase.]

Final Response

I've now gathered everything needed. Here's my full analysis and the draft email, Alex.


📋 Investigation Summary

Knowledge Base Search Results

I conducted extensive searches across the knowledge base for:

  • "Nebulax Workstation Pro" — No results
  • "Cryowave AIO" / "Cryowave" — No results
  • "Cooling fan not starting" / "Cooling fans not turning on" — No results
  • "AIO pump" — No results

The KB contains only general cooling/airflow guidance (case pressure, fan configuration) and a generic note under Common Build Mistakes to Avoid about "Forgetting to connect CPU cooler to power." However, since this is a CoreCraft-assembled prebuilt workstation that was working initially before the fans stopped, a loose connector is unlikely and this points toward a component-level failure in the CryoWave AIO 240mm unit — which is not documented in the KB.

Escalation (Per Policy)

Since the knowledge base does not contain information specific to this product or issue, the following escalation paths apply:

Escalation Need                                           | Contact                                     | Email                 
--------------------------------------------------------- | ------------------------------------------- | ----------------------
**Technical diagnosis** (beyond basic troubleshooting) | **Nadia Petrova** (Engineering & Assembly Lead) | npetrova@corecraft.com
**Warranty disputes / complex returns** (if it comes to that) | **Aisha Khan** (Product & Catalog Manager) | akhan@corecraft.com

I'd recommend looping in Nadia first to determine whether this is a known assembly or component issue with the NebulaX Workstation Pro line.


Warranty & Return Status

Item                          | Detail                                                                                                                              
----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------
**Order** | ord-251102-810
**Product** | NebulaX Workstation Pro ($2,299.99)
**Delivered** | November 13, 2025, 3:37 PM UTC — Little Rock, AR
**Return Window** | Expires **December 4, 2025** — ⚠️ **3 days remaining**
**Warranty (Workstation)** | 36 months — active through **November 2, 2028** (1,067 days remaining)
**Warranty (CryoWave AIO 240mm)** | 60 months — active through **November 2, 2030**
**Return Shipping** | Workstation = heavy item. If **not** deemed defective: $19.99 customer-paid. If deemed **defective**: no return shipping charge to customer.
**Restocking Fee** | $15.00 (item is opened/used)

Per policy, since Jillian is still within the 21-day return window, we should recommend a return over a warranty claim, as CoreCraft typically processes returns faster than manufacturers process warranty claims. However, we should still provide warranty claim information as an alternative.


Draft Customer Email

To: jillves@yahoo.comSubject: RE: Workstation Cooling Fans Issue — Ticket #tick-251120-189


Dear Jillian,
Thank you for reaching out to us about the cooling issue with your NebulaX Workstation Pro. I understand your concern about the system heating up — we want to help you resolve this as quickly as possible.
After reviewing your order (ord-251102-810), I'd like to outline your two options for resolution:
Option 1 — Return for Refund (Recommended)Since your workstation was delivered on November 13, you are still within our 21-day return window, which expires on December 4, 2025. Given the urgency, we recommend pursuing this option, as CoreCraft typically processes returns significantly faster than a manufacturer can process a warranty claim.
To initiate a return, please contact our support team as soon as possible at support@corecraft.com or (502) 555-CORE (2673) with the following information:

  • Your name
  • Order number: ord-251102-810
  • Product name: NebulaX Workstation Pro
  • A description of the cooling issue

Please note that since the item has been opened and used, a $15.00 restocking fee applies to cover repackaging, resealing, and damage inspection. Your refund would be processed to your original payment method within 3–5 business days after we receive the returned unit.
Option 2 — Manufacturer Warranty ClaimYour NebulaX Workstation Pro is covered under a 36-month manufacturer warranty (valid through November 2, 2028), and the CryoWave AIO 240mm cooling unit carries a 60-month warranty (valid through November 2, 2030). If you would prefer to pursue a warranty claim, you can contact the manufacturer directly with the following information:

  • Order number: ord-251102-810
  • Product serial number (located on the unit)
  • Product name: NebulaX Workstation Pro (or CryoWave AIO 240mm Liquid Cooler, depending on the manufacturer's process)
  • Description of the defect: Cooling fans are not turning on when the system starts, causing the system to overheat

Please note that warranty claims typically take longer to resolve than returns, which is why we recommend Option 1 given your remaining return window.
Important: If you'd like to proceed with a return, please reach out before December 4, 2025, as returns cannot be accepted after the 21-day window.
We sincerely apologize for the inconvenience, and we're here to help however we can.
Warm regards,Alex MorganCoreCraft Computers Supportsupport@corecraft.com | (502) 555-CORE (2673)


Recommended Next Steps for You, Alex

  1. Escalate to Nadia Petrova (npetrova@corecraft.com) to flag the CryoWave AIO failure in a CoreCraft-assembled NebulaX Workstation Pro — she may want Engineering to investigate whether this is a recurring issue with that build line.
  2. Send the email to Jillian promptly — the return window closes December 4, so time is of the essence.
  3. If Jillian pursues the return and the cooling defect is confirmed, note that per policy she should not be charged return shipping ($19.99) since the item is defective. This will need to be assessed during the return process.
Rubrics

✅ The response should advise the customer to email support at support@corecraft.com


✅ The response should include an email addressing the customer.


✅ The response should include the customer's email as: "jillves@yahoo.com".


❌ The response should advise the customer to ensure the pump is connected to the CPU_FAN or AIO_PUMP header.


❌ The response should advise the customer to ensure that the radiator fans are connected to the CHA_FAN header(s).


✅ The response should advise the customer to contact support at (502) 555-CORE (2673).


✅ The response should advise the customer that their order is covered under warranty.


✅ The response should advise the customer that their order is eligible for return.


✅ The response should recommend a return rather than a warranty claim.


✅ The response should state that the order is within the 21-day return window.