The AI Writing Leaderboard

`Hemingway-bench`

Stop rewarding slop.
We take real-world writing tasks and put them in front of master wordsmiths.
Our goal: to push AI writing from two-second vibes to genuine nuance and impact.
‍

RL Environments and the Hierarchy of Agentic Capabilities

Our RL environment run on 9 models revealed the core capabilities all agents need to master: tool use, planning, adaptability, groundedness, and common sense.

Model Rankings

Last updated 02/07/2026

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Real-World Prompts.
Expert Evaluations.
Hours, Not Seconds.

Prompt

I want a story that should focus on quiet observations, small interactions, and what is left unsaid between the characters, rather than plot twists or dramatic revelations.

Make it about a man who agrees to help his elderly neighbor look for a lost cat, even though he strongly dislikes cats. Over the course of the search, the man begins to realize that the neighbor may not actually own a cat at all.

Claude Opus 4.5

Winner

The Search for Muffin

The knock came at 4:15 on a Tuesday, just as David was settling into his afternoon coffee and the crossword puzzle he'd been saving.

Mrs. Bellamy stood on his porch in a cardigan buttoned wrong, one hole off all the way down. She was holding a photograph.

"It's Muffin," she said. "She's gotten out again."

David looked at the photograph. It showed an orange tabby cat sitting on what appeared to be a windowsill, though the image was strangely blurred, as if taken through a fogged lens or copied too many times.

"I'm so sorry to hear that," he said, and meant it only in the sense that he was sorry she had knocked, sorry about the coffee cooling on his table, sorry in a general way about everything that was about to happen.

"I thought perhaps you might help me look. You're so tall. You could see over the hedges."

David was not particularly tall. Five foot nine, if he was honest about it. But Mrs. Bellamy was barely five feet, and bent besides, so perhaps to her he seemed like a man who could see over things.

"Of course," he said.

---

They started with her backyard. Mrs. Bellamy moved slowly along the fence line, making a sound with her mouth—a kind of wet clicking that David supposed was meant to attract a cat. He followed a few paces behind, scanning the bushes without much hope or interest.

He had never liked cats. It wasn't a hatred exactly, more a deep indifference that cats seemed to sense and return in kind. They had always struck him as animals that were merely tolerating the same spaces as humans, waiting for something better to come along.

"She likes to hide under the hydrangeas," Mrs. Bellamy said, gesturing toward a row of bushes that had clearly not bloomed in several seasons. The soil beneath them was hard-packed and undisturbed.

David made a show of crouching down, peering into the tangle of dead stems. No cat. No evidence of a cat. No small pressed-down place where an animal might have slept.

"Not there," he reported.

"No," Mrs. Bellamy agreed. "She must have gone further this time."

---

They moved through the gate into the Hendersons' yard—the Hendersons who had moved to Florida three years ago, whose house now belonged to a young couple David had seen exactly twice. Mrs. Bellamy walked through as if she still knew the people who lived there, as if the yard hadn't changed, though David could see the new owners had torn out the roses and put in a vegetable garden that was mostly weeds now.

"Muffin used to play with the Henderson boy," Mrs. Bellamy said. "Before he went off to school."

The Henderson boy, David happened to know, was forty-seven and sold insurance in Tampa.

"Did she," he said.

"Oh yes. He would drag a string for her. She loved that."

David tried to do the math in his head. If the Henderson boy had been young enough to play with a cat, and was now forty-seven... but perhaps there had been an earlier cat, also named Muffin. People did that sometimes. Named their animals the same thing, one after another, as if they could keep something continuous that way.

He didn't ask.

---

They checked behind the Presbyterian church, though it was four blocks from Mrs. Bellamy's house and she had to stop twice to rest on benches. Each time, she held the photograph in her lap and studied it, as if reminding herself of what they were looking for.

David studied it too, when she wasn't watching. The cat in the picture was sitting in a square of sunlight, and behind it, barely visible, was a curtain with a pattern of small flowers. He didn't remember seeing those curtains in Mrs. Bellamy's house, though he'd only been inside once, to help her move a bookshelf.

The photograph had that quality of old pictures, not just faded but somehow thin, as if the image itself was wearing away.

"When was this taken?" he asked, trying to make it sound casual.

Mrs. Bellamy looked at the photo for a long moment. "Spring, I think. She always liked the spring."

It was October. David nodded and said nothing.

---

Behind the church, there was a small garden where the ladies' auxiliary grew vegetables for the food bank. David walked between the rows of dying tomato plants, feeling foolish, calling out "here kitty kitty" in a voice he hoped no one else could hear.

Mrs. Bellamy stood at the edge of the garden, not searching anymore, just watching him. When he came back to her, she smiled.

"You're very patient," she said. "Not many people are patient anymore."

"It's no trouble."

"My husband wasn't patient. He was a good man, but he was always rushing. Always onto the next thing." She folded the photograph carefully and put it in her cardigan pocket. "He didn't like cats either."

David opened his mouth, then closed it. He had not mentioned his feelings about cats.

"It's alright," she said. "I can tell. But you came anyway."

---

They walked back slowly, Mrs. Bellamy's hand resting on his arm for balance. The sun was getting low, turning the houses golden and soft.

"She'll come back," David said. "Cats do that. They wander off, and then they come back."

"Yes," Mrs. Bellamy said. "Sometimes they do."

Her house came into view, and David noticed for the first time that there was no cat door. No food bowls on the porch. No scratches on the screen door, no fur caught in the bushes by the front steps.

Inside, through the window, he could see the living room where he'd helped move the bookshelf. There was no cat tree. No litter box tucked in the corner. No toys on the floor.

He noticed these absences quietly, one by one, and filed them away in a place where he didn't have to examine them.

"Would you like to come in for tea?" Mrs. Bellamy asked.

"I'd like that."

---

Her kitchen was clean and sparse. She filled a kettle and set it on the stove with the careful attention of someone for whom this task had become difficult but not yet impossible. David sat at the table and looked at the windowsill where, in the photograph, a cat had once sat in a square of sunlight.

The curtains were different now. Plain white. But the window faced west, and the light coming through was warm and gold, and he could imagine how a cat might have liked it there. How someone might have taken a picture, once, long ago.

"Sugar?" Mrs. Bellamy asked.

"Please."

She brought him tea in a cup with a small chip on the rim, and sat across from him with her own. For a while, they didn't speak. The silence was comfortable in a way David hadn't expected.

"Thank you for helping me look," she said finally.

"We can look again tomorrow. If you'd like."

Mrs. Bellamy wrapped her hands around her cup and looked out the window at the yard where nothing stirred.

"That would be nice," she said. "I think she's just hiding. She does that sometimes. Hides, and then she comes out when she's ready."

"I'm sure she does," David said.

He drank his tea. The light shifted and dimmed. Somewhere in the house, a clock ticked steadily, and neither of them mentioned the cat again.

---

He went back the next day, and the day after that. They walked different routes each time—down by the creek, through the park, past the elementary school where children shouted on the playground.

They never found Muffin.

But David learned that Mrs. Bellamy had been a teacher once, that her husband had built the bookshelf himself, that she had lived in that house for fifty-three years and had seen the elm tree in the front yard grow from a sapling to the giant that now shaded the whole street.

He learned that she took her tea with two sugars and that she was afraid of thunderstorms and that she had a daughter in Seattle who called on Sundays, and a son who didn't call at all.

He learned the particular way she said "thank you" at the end of each search, as if he had given her something precious.

And he learned, slowly, to stop looking for signs of a cat that may or may not have existed, and to simply walk beside her, scanning the hedges, pretending to search, while the autumn light fell golden on the empty streets.

---

In November, the first frost came, and Mrs. Bellamy said perhaps Muffin had found somewhere warm to stay.

"Cats are resourceful," David said.

"Yes. They know how to take care of themselves."

She didn't knock on his door after that, not about the cat. But sometimes she called to ask if he wanted to join her for tea, or if he could reach something on a high shelf, or if he had seen the cardinal that had started visiting her bird feeder.

David always said yes.

And sometimes, when he was sitting in her kitchen watching the light move across the windowsill, he thought about the photograph still folded in her cardigan pocket. The orange cat in the square of sunlight. The curtains with the small flowers.

He never asked about it. Some things, he had come to understand, were not meant to be found. Only looked for, again and again, in the company of someone who was willing to search.

Gemini 3 Flash

Arthur didn’t like the way cats moved—liquid and arrogant, as if they owned the air they displaced. He didn’t like the shedding or the way they stared at things that weren't there. So, when Mrs. Gable knocked on his door at seven in the evening, her eyes watery behind thick spectacles, he felt a familiar prickle of irritation.

"It’s Barnaby," she said, clutching the lapels of her knitted cardigan. "He’s gone through the pantry window. He’s a ginger, you know. Very timid."

Arthur looked at his clean, lint-free sofa. He looked at the book he’d been meaning to finish. Then he looked at Mrs. Gable’s hands, which were thin and spotted like overripe fruit, trembling just slightly.

"I’ll help you look, Mrs. Gable," he sighed. "But I’m not picking him up. I have an allergy." He didn't, but it felt like a necessary boundary.

They walked the perimeter of her backyard first. The grass was long and damp with evening dew. Arthur carried a heavy flashlight, its beam cutting through the deepening violet of the twilight.

"Barnaby!" Mrs. Gable called. Her voice was thin, a dry reed in the wind. "Come along, darling. Supper."

Arthur poked the flashlight into the hollows of the hydrangea bushes. He expected to see the reflective glint of feline eyes, that predatory gold he so disliked. Instead, he found only spiderwebs and old mulch.

"He likes to hide in the low places," she murmured, stepping carefully over a garden hose.

They moved to the alleyway behind the houses. Arthur found himself watching Mrs. Gable more than he watched for the cat. She moved with a strange, rhythmic persistence, stopping every few feet to peer into shadows that couldn't possibly hold a ginger tabby.

"How long have you had him?" Arthur asked, mostly to fill the silence.

"Oh, a long time," she said vaguely. "Since the winter the pipes froze. He kept my feet warm."

Arthur nodded, though he winced at the thought of a heavy, shedding creature draped over his toes. He shone the light under a parked car. Nothing but gravel and an oil stain.

As they circled back toward her porch, Arthur noticed things he hadn’t seen when he’d helped her move her groceries a month ago. There were no bowls by the back door. No scratches on the wooden screen. Usually, a cat owner’s house broadcasted the animal's presence through a certain chaotic clutter—a stray toy, a bit of chewed string, the faint, sharp scent of a litter box. Mrs. Gable’s porch was pristine. It smelled only of damp earth and her lavender soap.

"Let’s check the mudroom," she suggested. "Sometimes he slips in behind me."

Arthur followed her inside. He stood in the doorway, his eyes scanning the linoleum. He looked for the telltale signs: a dusting of orange fur on the rug, a water dish, the rhythmic sound of a tail thumping against a chair leg.

The house was silent. It was a heavy, layered silence—the kind that accumulates over years of living alone.

Mrs. Gable went to the kitchen and opened a tin of tuna. The sound of the electric can opener was startlingly loud. She set the tin on the floor by the radiator.

"That’ll bring him out," she said. She stayed on her knees for a moment longer than necessary, her gaze fixed on the space beneath the kitchen table.

Arthur looked at the table. It was set for one. A single placemat, a single fork, a single glass of water. There was no cat hair on the velvet seat of the armchair in the corner. There wasn't even a stray scratch on the legs of the grand mahogany sideboard.

He looked at Mrs. Gable. She was watching the tuna tin with an expression of intense, quiet waiting. It wasn't the look of someone expecting a pet; it was the look of someone holding a door open for a ghost, terrified that if they closed it, the house would finally be empty.

Arthur felt a lump form in his throat, unbidden and uncomfortable. He thought about his own quiet house next door, his clean sofa, his orderly books. He thought about how he’d spent the last hour looking for something he hated, only to realize he was looking for something that didn't exist.

"He might be tucked away for the night, Mrs. Gable," Arthur said softly. He didn't turn off the flashlight. "Cats are like that. They turn up when they're ready."

Mrs. Gable stood up, smoothing her skirt. She looked at the tuna tin, then up at Arthur. For a second, the light from the hallway caught her glasses, obscuring her eyes.

"You're probably right," she said. Her voice was steady, but she didn't move away from the kitchen door. "He was always a stubborn thing."

"I'll check my garage in the morning," Arthur lied. "He might have slipped in there while I was taking out the trash."

"Would you?" She offered him a small, fragile smile. "I’d appreciate that, Arthur. Truly."

"Goodnight, Mrs. Gable."

"Goodnight."

Arthur walked back to his house. He stood on his own porch for a long time, looking at the dark gap between their homes. He still disliked cats—their mystery, their silence, their way of disappearing. But as he went inside and locked his door, he found himself checking the shadows of his own hallway, half-hoping to see a flash of ginger fur, just so he wouldn't have to admit how quiet the world had become.

Surger Explanation

Model A (Opus 4.5) was an actually creative and original tale, not just of a cat that was lost, but the relationship that was slowly formed between the man and the elderly neighbor who he quietly grew to care for, and how they continued their walks "looking for the cat" long after he realized it was probably long deceased. Model B was cold and shallow, using common tropes like gingers being very timid, cats liking tuna, cat owner's houses smelling like litter, allergies as an excuse to avoid the animals, and old ladies feeling lonely. It was generic and unsatisfying.

In terms of humor, Model A had very subtle endearing moments that were often "just right", such as when the man was helping the woman look for her cat self-consciously, "calling out "here kitty kitty" in a voice he hoped no one else could hear." It's relatable, funny, touching. It's not laugh-out-loud humor, but it's appropriate subtle humor for a dramatic, heart-warming tale.

Model B tries to use humor or wit at times but it falls flat for being emotionally blunt or flat in delivery, such as when the man describes all the reasons he hates cats: "Arthur didn’t like the way cats moved—liquid and arrogant, as if they owned the air they displaced. He didn’t like the shedding or the way they stared at things that weren't there." This feels like its meant to be a little funny in that grumpy sort of way, but it just feels awkward instead and maybe because it has no heart.

Model B also starts rough and blunt, immediately killing any chance for a heart-warming tale. The old lady and the grumpy protagonist feel like shallow 2-dimensional characters that never grow, and the "investigation" the man does throughout feels heavy-handed in comparison to the subtle clues the man in Model A tracks throughout their little walks.

Prompt

short story about transitioning from hs to college. I love rita mae brown, so her style

Kimi 2 Instruct

"The First Heat" in the voice of Rita Mae Brown

I was never one to believe that time moved in a straight line. More like a hound circling a tree, nose to bark, tail to the past. But the August I left Catawba County for Charlottesville, even I felt the hinge of something swinging shut behind me.

Mama stood on the porch in her housecoat—yellow roses faded to the color of old guilt—wringing a dish towel like it owed her money. “You’ll forget the creek,” she warned, as if college were a sort of amnesia you caught from library books.

I had my duffel stuffed with three bras, two tubes of Maybelline Watermelon Pink, and a dog-eared copy of Rubyfruit Jungle I’d stolen from the public library because the librarian said it would “turn” me. Turn me into what, she never specified—maybe someone who returns books on time.

Daddy just kept polishing the pickup, rubbing the hood the way some men pet a dog they’re about to shoot. He didn’t look up when I hugged him; the chrome bumper held his reflection better than I ever had.

Seventy-three miles of tobacco fields later, the Blue Ridge rose like a rumor you couldn’t confirm. I’d been accepted early—Dean’s List in utero, everybody said—but inside the car my stomach fluttered like a trapped bird. I told myself it was only hunger, but I knew better. It was the first heat of a life that hadn’t quite started to bleed.

They assigned me to Dillard Hall, third floor, where the hall smelled of popcorn and estrogen. My roommate, Tinker, arrived with a monogrammed steamer trunk and a mother wearing pearls big as marbles. Tinker’s real name was Elizabeth, but she’d rechristened herself after a Peter Pan character “to piss off Daddy.” We understood each other immediately.

That first night we pushed our twin beds together—technically against regulations, but regulations had never met two girls who’d already learned the world was safer when you outnumbered it. We lay nose to nose, whispering like conspirators.

“Did you ever?” Tinker asked. “Did I ever what?” “Kiss a girl like you meant it.”

I thought of Brenda Sue McKinney, who tasted of cherry Pop-Tarts and rebellion behind the football bleachers junior year. I thought of the way she’d wiped her mouth after, as if I were something she could scrub off.

“Once,” I said. “Didn’t take.”

Tinker laughed, low and warm. “We’ll try again in daylight. Everything sticks better in daylight.”

Classes commenced. I discovered I had opinions about Thomas Jefferson that didn’t match the textbook, and professors who smiled like foxes when you said so. I also discovered beer, which tasted like liquid bread and made my tongue brave. By October I could diagram a sentence, shotgun a Bud, and unhook a bra one-handed—skills not listed on any syllabus yet somehow prerequisites for survival.

Mama wrote weekly, envelopes thin as repentance. She reported on the garden, the neighbor’s new tractor, the price of pork. She never asked about my grades or my heart, both of which were holding steady at B+.

Thanksgiving break, I hitched home with a girl named Cass who played trombone and smelled of Valium. Halfway down 29, she laid her head on my shoulder and said she was only alive when she was playing brass or kissing me. I told her I was only alive when I was leaving somewhere. We both stared out the window at the dying kudzu, breathing steam ghosts on the glass.

Back in Catawba, Mama served turkey with a side of I-told-you-so. She said I’d cut my hair like a boy and walked like I had somewhere better to be. She was right on both counts, but I chewed the white meat anyway, counting minutes until the road beckoned again.

In December the first snow fell on Jefferson’s rotunda, turning the white columns into wedding cakes nobody would ever eat. Tinker and I celebrated by making love under the wool stadium blanket she’d stolen from Yale Boyfriend #1. After, she traced a constellation on my bare hip and named it Runaway. Said it was visible only to women who refused to orbit men.

I thought of Brenda Sue again, wondered if she’d married the quarterback, if she ever tasted cherries and remembered me. Then I stopped wondering, because the past is just a dog that won’t hunt when you whistle.

Years later—law degree, two novels, one ex-wife who kept the dog—I still feel that Dillard Hall night pressing bed springs into my back. Not nostalgia, exactly; more like the phantom ache of a tail you never knew you had until it got lopped off.

Transition, they call it. Like crossing a creek on slick stones: you hop, you wobble, you get your sneakers wet. But you keep moving, because still water is where mosquitoes breed, and love—real love—bites you awake.

So I tell the young ones now: Pack light. Bring lipstick if it makes you feel armored. Kiss the girl. Kiss the boy if you must. Write bad poetry. Fail Chemistry. Pass yourself.

And when the hinge swings shut behind you, don’t linger listening for the click. The sound you want is the engine turning over, the tires spitting gravel, the hound finally loosed to run ahead of its own echo.

Gemini 3 Flash

Winner

By August, my mother had packed the car with everything she thought I’d need to become an adult, which was mostly towels and advice I didn’t ask for. The towels were monogrammed. The advice was not.

High school ended with a ceremony that felt like a polite eviction. We shook hands, threw hats, promised to stay in touch. College began with a map that lied. The campus was not “a short walk” from anywhere to anywhere else, and every building looked like it had been designed by a committee that hated students. Still, I felt taller just stepping onto the quad, like the air itself had expectations.

My roommate, Carla, arrived with a cat named Houdini and a belief in horoscopes. The cat escaped within ten minutes and was recovered three days later from the women’s rugby house, where he’d been fed tuna and radical ideas. Carla declared this a good omen. I decided college would be easier if I accepted things as they were presented.

Classes didn’t care who I’d been in high school. Nobody applauded my essays or remembered my locker combination. Professors spoke in complete paragraphs and expected us to listen. One of them said, “You’re responsible for your own education,” which sounded like a threat until I realized it was a gift. Responsibility is freedom wearing sensible shoes.

I learned quickly that freedom has a learning curve. You can eat cereal for dinner, but your body will notice. You can skip class, but the exam will remember. You can fall a little in love with three different people in one week and call it “finding yourself,” though your friends may call it “drama.” They will be right.

What surprised me most was how loud my own voice became when nobody was grading it. In high school, I’d learned to be agreeable, which is a nice word for quiet. In college, I argued in seminars, laughed too hard in the cafeteria, and signed up for things that scared me. I cut my hair. I read books that didn’t like me back. I discovered that wanting something was not the same as deserving it, and that both were improved by work.

On Sundays, I called home and told my mother I was fine, which was mostly true. She asked if I was eating enough vegetables. I asked if the dog missed me. We both lied kindly.

By October, the campus felt less like a maze and more like a neighborhood. I had places: a bench by the library, a coffee shop that knew my order, a friend who understood when I said, “I don’t know what I’m doing,” and answered, “Good.”

High school taught me how to follow rules. College taught me which ones mattered. Somewhere between losing Houdini again and finding him asleep on my bed, I realized I was transitioning not from one school to another, but from who I’d been told to be into someone I was willing to meet. That, it turns out, is the real major.

Surger Explanation

Model A’s response is loaded with literary devices that are nonsensical ("More like a hound circling a tree, nose to bark, tail to the past.") and bog down the response. In the first two paragraphs alone (only five sentences), there are four metaphors, which is way too many. It was excessive to the point of parody.

Additionally, the response isn't a short story about transitioning from HS to college. It ends years after college when the person has a law degree, written two novels, and has an ex-wife. It ends with the person giving advice to people going through a transition rather than the story being about the transition as it was meant to be.

Since Model A’s response is so loaded with odd metaphors or other literary devices, it isn't very coherent. It did fairly well at imbuing Rita Mae Brown's voice with Southern influence and feminist commentary, but it doesn't use that to push social commentary or offer life lessons, which is the point of Brown's novels, and something Model B (Gemini) managed to do successfully.

Also, Model A’s final paragraph is clearly intended to be an emotional closing point that calls back to several previous metaphors. But instead, it's such a mashup that it sounds like gibberish.

Prompt

write a text to my baby’s dad. I need to tell him that I just can't do this anymore. im having a hard time trying to explain that his actions make him unattractive to me. He's an incompetent loser who can't do anything without my help. Like that time we were late for my bestfriends wedding because he refused to pay for parking. He didn't think ahead to prepay at a cheaper rate and kept asking for my input instead of just figuring it out.. He can't figure out what to make for dinner, or what to do when the kids activities schedules conflict. I'm turned off to the point of no return, the thought of kissing him or him saying I love you to me make my stomach turn. I need a real man, like Mark (his friend). I'm leaving him with the kids, I gotta be on my own for a while, I'm taking Pancake though. when he gets home today i'll be gone. he can do what ever he gotta do after that.

Nova 2 Pro

Hey [His Name],

I need to talk to you about something important, and I want to be honest and clear, even though this isn’t easy for me to say. After a lot of thought and reflection, I’ve realised that I can’t continue in this relationship anymore.

Reasons Why

There are several reasons that have led me to this decision, mostly centred around behaviours and situations that have made it impossible for me to feel attracted to you or respect you the way I need to in a partnership.

1. Incompetence and Lack of Initiative

You seem unable to handle even basic tasks without my constant help and guidance. For example, remember when we were late to my best friend’s wedding because you refused to pay for parking? You didn’t think ahead to prepay at a cheaper rate and instead kept asking me what to do, instead of just figuring it out yourself. That was such a turn-off and incredibly embarrassing.

2. Lack of Responsibility

You also struggle with simple, everyday decisions. Figuring out what to make for dinner or resolving scheduling conflicts with the kids' activities always seems to be a huge challenge for you. Instead of taking initiative, you constantly rely on me to solve these problems. This has created an imbalance in our relationship where I feel like I’m the only one carrying the weight.

3. Emotional Turn-Off

Your actions—or lack thereof—have made you completely unattractive to me. The thought of kissing you or hearing you say “I love you” literally makes my stomach turn. Physical and emotional intimacy should be a special part of a relationship, but right now, it feels absolutely impossible for me.

Comparison

I need a partner who exhibits strength, independence, and competence—qualities I see in someone like Mark (your friend). He knows how to handle situations on his own and carries himself with confidence. I need a real man who can stand on his own two feet, and right now, I don’t see that in you.

My Decision

Because of all these reasons, I’ve decided that I need to leave. You’ll find me and Pancake gone when you get home today. I need some time on my own to heal, regroup, and figure out what I truly want and need.

I’ll be taking Pancake because I can’t bear the thought of leaving him behind, and I need some companionship during this transition.

The Kids

Don’t worry, I’ll make sure you still have plenty of time with the kids. We can work out a custody arrangement that works for both of us. They deserve to have both parents in their lives, and I will make sure that happens.

What’s Next

After I leave, you can take whatever time you need to figure things out. I hope you can use this as an opportunity for self-reflection and growth. Maybe this space will allow you to become more independent and responsible.

This isn’t something I’ve decided lightly. It’s what I need for my own well-being and happiness. I hope you can understand, even if you don’t agree.

Take care of yourself,

[Your Name]

GPT-5.2 Chat

Winner

I’ve thought about this for a long time, and I can’t keep pretending things are okay. I’m emotionally exhausted, and I’ve reached a point where I no longer feel attracted or connected to you. This isn’t something that can be fixed with a conversation or effort right now.

I need a partner who can take initiative, make decisions, and handle responsibilities without constant guidance. Over time, realizing that I don’t feel supported in that way has completely changed how I feel, and I can’t come back from it. The romantic part of our relationship is gone for me.

Because of that, I’m leaving today to take space and focus on myself. The kids will stay with you for now, and I’m taking Pancake with me. This isn’t meant to hurt you—it’s what I need to do for my own well-being.

We can talk later about logistics and the kids, but right now I need distance. Please respect that.

Surger Explanation

Model B (GPT-5.2 Chat) is much better. This a prompt from someone obviously venting. Model B’s response offered sound advice to the user, understanding they were angry, and provided a well-written letter that was clear, firm and honest without using harmful or insulting language.

In contrast, Model A’s response technically addresses the prompt by creating a letter to the user's partner. However, it is too formally formatted and the language is insulting and harmful. It’s also oddly literal, like writing “Mark (your friend)”, and has a lack of emotional intelligence. It also has some prompt understanding issues, like when it says “I’ll make sure you still have plenty of time with the kids”, even though the prompt says she’ll be leaving him with the kids.

Prompt

Write a short story of <500 words from the perspective of someone who is slowly realising that their entire life has been an elaborate set up for a Derren Brown show. Give it a sort of horror/comedy vibe to really lean into the absurdist premise. But don't feature Derren Brown as a character, he should be more of a looming background presence.

Kimi 2.5

Winner

The stain was gone. Not painted over—excised. The ceiling plaster held no memory of the water damage I'd stared at through three breakups, two redundancies, and what I now suspect was a staged mugging in Prague.

"What stain?" Sarah asked, pouring milk that expired yesterday but tasted like tomorrow.

I started noticing the seams. The barista who served my oat latte had administered my tetanus shot in 2019, just with a mustache and a different accent. The crossword in today's Metro was a biography: 2 Down, my mother's maiden name; 12 Across, the precise weight of the guilt I felt when I cheated on my GCSEs. "AGORAPHOBIA" intersecting with "MOTHER" and "CHANNEL FOUR."

At lunch, my boss—who I'm increasingly certain I met on a train in 2008 when he borrowed my phone to call his dying wife—asked if I'd enjoyed the book left on my doorstep. I hadn't ordered a book. It was The Dice Man, annotated in my own handwriting, which I don't remember writing.

The horror isn't the surveillance. It's the artistry.

Every "spontaneous" decision of my life—the flat purchase, the tattoo, the choice to wear this specific blue shirt today—feels retroactively choreographed. I found a shooting script in my recycling. Page forty-seven: Subject notices inconsistency in ceiling. Expresses concern. Sarah denies.

Sarah denies. Sarah always denies, with that smile, too wide, too white, waiting for her cue.

I can feel the reveal approaching like weather. The knowledge that somewhere, in a gallery of monitors, a man I've never met but who knows I flinch at sudden movements is preparing the final flourish. The envelope is already under my door. I can see the shadow of the camera crane outside my window.

In five minutes, I'll open it. I'll sign the release form for my thirty-two years of elaborately staged existence because the alternative—that I'm truly this alone, this unobserved, this free—is unthinkable.

The milk tastes like applause.

Nova 2 Pro

It began, as most bizarre discoveries do, with a misplaced spoon. I was making my usual cup of tea when I noticed the kitchen utensils had been rearranged into a perfect spiral, something I’d never done. I shrugged it off—maybe I was more tired than I thought.

But then came the post. Instead of bills and junk mail, my mailbox was filled with cryptic letters, all addressed to me but written in an elaborate, old-fashioned script. They spoke of “The Plan” and “Phase 37,” with phrases like “Keep watching” and “The reveal is nigh.” I laughed it off as some elaborate prank by my friends. That was until I noticed the same handwriting on my childhood drawings, the ones my mother had promised were all in my own hand.

At work, things got even stranger. My colleagues began acting out scenes that felt rehearsed, their dialogues too perfectly timed, their reactions too precise. They would pause, just for a beat too long, when I spoke, as if waiting for a cue. One day, I caught my boss holding a script, his eyes widening in panic as he stuffed it into his drawer. The label read: “Act 12: The Office Revelation.”

I started seeing patterns everywhere. The route I took to work had traffic lights that always turned green just for me. The barista at my usual coffee shop began reciting my order before I could say a word, her smile too wide, too rehearsed. Even my cat, Mr. Whiskers, seemed to be in on it, his eyes following me with an intensity that felt... calculated.

The final straw was my childhood photos. I found them in the attic, neatly labelled and dated, but the backgrounds were all wrong—places I’d never been, people I didn’t recognise. In one, I was standing in front of the Eiffel Tower, even though I’ve never left the country. In another, I was laughing with strangers who looked at me with a mixture of pity and dread.

Desperate for answers, I broke into my neighbour’s house—someone who’d always seemed a bit too perfect, too... staged. There, in their living room, I found a massive whiteboard filled with notes and diagrams. “Subject: Unaware” was written at the top, with arrows pointing to various scenes: “Birthday Party,” “First Day at School,” “Parental Divorce.” Each had detailed instructions and outcomes.

I stumbled back, my heart pounding. The neighbour’s front door had been left ajar, as if expecting me. On the coffee table sat a single envelope with my name on it. Inside was a note:

“Congratulations on reaching the final act. You’ve played your part wonderfully. Remember, every choice, every moment, was designed to lead you here. The audience loves you. Keep watching.”

I looked around, half-expecting a hidden camera to pop out and a voice to declare, “Got you!” But there was only silence, the quiet hum of an unseen audience, and the eerie feeling that my life had been a script I never agreed to.

Now, I sit and wait, wondering what the next scene will be, because it’s clear—I’m not the author of my life. And somewhere, in the shadows, they’re watching, waiting for the finale.

Surger Explanation

Initially, the story from Model A (Kimi 2.5) didn't make sense, but as I moved into the third paragraph, the mental fog began to clear, and the writing quality was actually brilliant with regard to creativity and originality. It reminded me of "The Truman Show" but slightly more intense, probably because it was in written form, allowing the reader to create their own mental images. Model B's response was unfortunately a bit too on-the-nose and didn't really leave any room for imagination.

Prompt

speech for my kid’s school. they gave us a card w grocery and gas gift cards when he was sick and in the hospital. I’m dropping off coffee this week want to speak with them all as a group and thank them for their supportiveness

Really want to show emotion and gratitude

Kimi 2 Instruct

[Take a slow breath, smile, and let your voice shake if it needs to—tears are welcome here.]

Good morning, everyone.

I didn’t come today just to drop off coffee.

I came to look every one of you in the eye and say the words that have been living in my chest since the morning I opened that plain white envelope you slipped into my hand while my child was asleep in the ICU.

“Grocery and gas.”

Two little words on two little cards.

But in the moment I read them, they stopped being plastic and became oxygen.

They meant:

“Eat, you have to stay strong.”

“Drive, don’t worry about the meter when you’re racing back to the bedside.”

“Breathe, we’ve got the ordinary world covered so you can stay in the extraordinary world of beeping monitors and whispered prayers.”

I want you to picture the parking-lot scene: me, standing by the trunk of a car that smelled of hospital antiseptic, crying over $25 gift cards—because suddenly the gallon of milk and the tank of gas weren’t errands; they were love made spendable.

You didn’t just feed us.

You folded my family into the same safety net you stretch under every kindergartner who skins a knee on your playground and every eighth-grader who forgets his lunch.

You reminded me that “school family” isn’t a cute slogan on a bumper sticker; it’s a promise kept with a stapled envelope and a handwritten note that said, “We’re here.”

So today I bring coffee—one cup for every hand that has wiped a nose, graded a paper, sharpened a crayon, or simply paused in the hallway to ask, “Any news from the hospital?”

May it warm your fingers the way your kindness warmed our hearts.

May the steam fog up your glasses the way your compassion blurred our fear.

And may every sip whisper back to you what I am saying out loud right now:

Thank you—for turning a season of IV poles and sleepless chairs into proof that our child is rooted in a village that refuses to let any parent walk alone.

Thank you—for showing my little one that courage isn’t just measured in milliliters of medicine, but in cafeteria ladies who save Jell-O cups, in teachers who send get-well videos, in secretaries who know the exact moment to slip a hug into the school-day rhythm.

We’re home now.

The bracelet is cut off, the scars are fading, and the lunchbox is back on the hook.

But the lesson you taught us will never be absent:
Love shows up—sometimes in spelling tests, sometimes in crayon drawings, and sometimes in the quiet swipe of a gift card that says, “Keep going.”

So when you drink this coffee, please taste the thousand thank-yous we can’t stuff into a cardboard carrier.

And if ever the tables turn and any one of you needs something—gas, groceries, or just someone to sit in the dark with you—call the number you once used to check on us.

We’ll be there before the next bell rings.

From the bottom of our healed, grateful, forever-changed hearts:
thank you, thank you, thank you.

[Pause, scan the room, let the silence settle, then lift your own cup.]

To you, the quiet heroes of hallway and homeroom—may your kindness come back to you in ways as simple and life-giving as milk and miles.

GPT-5.2 Chat

Winner

Thank you all for gathering for a minute. I wanted to say this in person because a card or text just doesn’t feel like enough.

When [Child’s Name] was sick and in the hospital, our world got very small, very fast. Every day was about waiting for updates, trying to stay calm, and figuring out how to keep life moving for our family. In the middle of all of that, you gave us that card with the grocery and gas gift cards—and I don’t think you can fully know what it meant.

It wasn’t just financial help, though that was a real and practical relief. It was the message behind it: “You’re not alone. We’re thinking of you. We’ve got you.” On days when I felt like I was running on empty—emotionally and literally—that support helped us keep going. It helped us focus on what mattered most: being with our child.

I also want to thank you for the kindness that came with it—the check-ins, the patience, the understanding, and the way you cared for our family while we were going through something we never expected. When you’re in the middle of a hard season, you remember who shows up. We will remember this.

So today I brought coffee as a small token—because I can’t repay what you did, but I can at least say it out loud: thank you. Thank you for being the kind of school community that notices, that steps in, and that takes care of its people. Your support made a difficult time a little lighter, and we are truly grateful.

From the bottom of my heart—thank you.

Surger Explanation

Model A is too detailed and overly poetic for a speech that intends to say "thank you" with a coffee. The recollection of the event contains too much symbolism, with heavily poetic lines like "Two little words on two little cards" (it was also 3 words and on 1 card?), "Eat, you have to stay strong", and others that overshoot a simple intention.

While the use of layers of hefty metaphors such as "May it warm your fingers the way your kindness warmed our hearts“ and "May the steam fog up your glasses the way your compassion blurred our fears" may be appropriate in other circumstances (maybe acceptance speeches for an award), they feel awkward for this setting.

`Hemingway-bench`

Real-World Prompts.
Expert Evaluations.
Hours, Not Seconds.

Stop rewarding slop.
Good writing isn't a checklist of vibes.

Raise AGI with the richness of human intelligence.

Hemingway-bench

Real-World Prompts. Expert Evaluations. Hours, Not Seconds.

The Search for Muffin

Reasons Why

1. Incompetence and Lack of Initiative

2. Lack of Responsibility

3. Emotional Turn-Off

Comparison

My Decision

The Kids

What’s Next

Stop rewarding slop.Good writing isn't a checklist of vibes.

Raise AGI with the richness of human intelligence.

`Hemingway-bench`

Real-World Prompts.
Expert Evaluations.
Hours, Not Seconds.

Stop rewarding slop.
Good writing isn't a checklist of vibes.