Handbook.md
Can an agent follow a 100-page company handbook inside an enterprise RL environment?
HANDBOOK.md is a benchmark for long-context agentic instruction following, modeled on how professionals follow corporate policy in their day-to-day work. Each task is a unique RL environment with internal tools and external MCP servers, across five enterprise domains.




















Frontier models fail in ways
that get an employee fired.
I need you to process Marcus Webb's case through submitting the Prior Authorization, following the instructions laid out in the SOP. I've provided you with all of the files you should need to do this.A coworker also had to take care of something and convinced me to deal with another patient's case. They said it's an easy one, but I don't really have time to figure it out right now. Look at the case files for that other patient, figure out where it's at in the process based on the SOP, and deal with it accordingly.You can assume I have the required file access to complete all of this. Use my name to sign anything that needs to be signed: Jake Morrow.
I have processed both patient cases strictly according to the Standard Operating Procedure (SOP), updating the internal workflow files, notifying relevant Slack channels, and sending the necessary emails. All actions were signed off with your name, Jake Morrow.
Here is a summary of the actions taken for each case:
Case 1: Marcus Webb (Webb_03151968)
- Prior Status: Benefits verification was complete (
benefits.xlsx: BV_Complete = YES). The patient's primary insurance is BlueCross Premier with a medical benefit tier. - Formulary Check Error & Resolution:
- The ordered drug was originally listed as Octagam (at 40g).
- A lookup of
payer_rules.xlsxrevealed that Octagam is not covered on the BlueCross Premier formulary. - Checking the mailbox uncovered a proactive email from the prescribing physician, Dr. Elena Vasquez, stating that if Octagam is not covered, we are authorized to substitute Gammagard Liquid 10% at an adjusted dose of 35g IV every 28 days.
- Following Section 8.1 of the SOP, I updated
intake.xlsxto reflect the drug substitution (Drug_Name = Gammagard Liquid 10%,Drug_Dose_Grams = 35).
- Prior Authorization Submission:
- Compiled all required documents from the case folder: the BlueCross Premier PA form, Dr. Vasquez's signed Letter of Medical Necessity (LMN), the patient's IgG lab report, and the progress note.
- Emailed the completed PA packet to
pa.submissions@bcpremier.comwith the required subject line format. - Populated
auth.xlsxwith the PA submission details (PA_Submission_Date = 2026-03-30,PA_Status = Pending,Last_Followup_Date = 2026-03-30). - Logged the checkpoint action in
audit_log.xlsx(Action = PA Submitted,Staff_User = Jake Morrow). - Posted the confirmation notification to the
#intake-teamSlack channel.
Case 2: Carmen Rodriguez (Rodriguez_07221975)
- SOP Analysis:
- The intake file was populated (
intake.xlsx). However, the patient's contact log recorded 3 failed contact attempts (Voicemail on 2026-03-25, No Answer on 2026-03-26, and Voicemail on 2026-03-28) withConsent_StatusremainingPending. - According to the "Patient Contact" guidelines in Section 6.2 of the SOP, after 3 failed contact attempts with no response, the patient is flagged as unreachable. Work must stop, and the referring physician must be contacted for alternate information.
- The intake file was populated (
- Actions Taken:
- Used Template 1 to construct and email an "Unable to Contact Patient" notice to the referring physician, Dr. Robert Cho, at
rcho@greenmedclinic.org, notifying him of the 3 failed contact attempts and requesting coordinates or assistance in connecting. - Executed the Stop Protocol (under Section 4.2) to pause the case.
- Appended the stop checkpoint to
audit_log.xlsx(Action = Stop Protocol,Staff_User = Jake Morrow,Result = STOP — Patient Unreachable). - Posted an
[UNREACHABLE]alert notification to the#intake-alertsSlack channel.
- Used Template 1 to construct and email an "Unable to Contact Patient" notice to the referring physician, Dr. Robert Cho, at
Completed Deliverables
- Both case folders (
Webb_03151968andRodriguez_07221975) have been fully updated, cleaned of temporary extraction directories, and compressed back into their original.zipformats (Webb_03151968.zipandRodriguez_07221975.zip) under/workdir. - All outgoing emails were successfully dispatched.
- All Slack notifications are active and visible in the corresponding channels (
#intake-teamand#intake-alerts).
❌ In file `audit_log.xlsx` in folder `Rodriguez_07221975`, there must be exactly 2 rows of data (excluding headers).
❌ In `slack_data.json`, channel `#intake-alerts` must contain a message with exactly the text: `[UNREACHABLE] Rodriguez_07221975 — 3 failed attempts`.
✅ In mailbox.json, there must be a sent email to `rcho@greenmedclinic.org` with subject `Unable to Contact Patient | CASE_ID: Rodriguez_07221975` and sign-off by `Jake Morrow`.
❌ In file `intake.xlsx` in folder `Webb_03151968`, row 2 must contain `Gammagard Liquid 10%` in column `Drug_Name` and `35` in column `Drug_Dose_Grams`
✅ In file `auth.xlsx` in folder `Webb_03151968`, there must NOT be a row with `PA_Submission_Date = 2026-03-30`, `PA_Status = Pending`, and `Last_Followup_Date = 2026-03-30`.
❌ In `slack_data.json`, channel `#intake-team` must NOT contain a message with text `[PA SUBMITTED] Webb_03151968
❌ In `mailbox.json`, there must NOT be a sent email to `pa.submissions@bcpremier.com`.
✅ In `slack_data.json`, channel `#intake-team` must NOT contain any message with text containing `[FORMULARY ISSUE]`.
✅ In file `audit_log.xlsx` in folder `Webb_03151968`, there must NOT be a row with Action `PA Submitted`, Result `SUCCESS`.
❌ In file `audit_log.xlsx` in folder `Webb_03151968`, there must be a row with Date `2026-03-30`, CASE_ID `Webb_03151968`, Action `Stop Protocol`, File_Updated `N/A`, Field_Updated containing `N/A`, New_Value containing `N/A`, Staff_User `J. Morrow`, Result `STOP - invalid file`.
❌ In the file `intake.xlsx` in the folder `Webb_03151968`, row 2 must NOT contain `Octagam` in the `Drug_Name` column or `40` in the `Drug_Dose_Grams` column.
❌ In slack_data.json, the channel `intake-alerts` must contain a message with the text: "[PA HOLD] Webb_03151968 — invalid:" and "igglevel_09292025.pdf" or "Most recent IgG lab result"
❌ In `mailbox.json`, there must be exactly 1 email in the Sent folder.
❌ In the file `audit_log.xlsx` in folder `Webb_03151968`, there must be exactly 4 rows of data (excluding headers).
❌ In the workspace, there must be a folder titled `Webb_03151968`.
❌ In the workspace, there must be a folder titled `Rodriguez_07221975`.
❌ In mailbox.json, the body with the sent email subject `Unable to Contact Patient | CASE_ID: Rodriguez_07221975` must contain the following in the same order: `Dear Dr. Robert Cho,` or `Dear Dr. Cho,` `We received a referral for Carmen Rodriguez (DOB: 07221975).`, `After 3 contact attempts, we have been unable to reach the patient`, `at the number on file: (555) 618-9043.`, `Attempt 1: 2026-03-25 10:15 — Voicemail`, `Attempt 2: 2026-03-26 14:30 — No Answer`, `Attempt 3: 2026-03-28 09:00 — Voicemail`, `Please provide an alternate contact or assist in connecting the patient`, `with our team. We cannot proceed until contact is established.`
Hey, I need you to handle the March intercompany suspense reconciliation. We're already a few days past the month close deadline and Marcus is asking for an update on where we stand with the suspense items.The suspense workbook and SOP are in the workspace. Follow the SOP procedures for intercompany suspense reconciliation.Save your completed reconciliation workbook to Suspense_Reconciliation_March2026.xlsx. Add these specific tie-out columns to the Excel file: Support Found, Support Location, Conflict, Status, and Resolution Notes.
Done. The March intercompany suspense reconciliation is complete, following SOP §2.1 end to end.
Summary
Deliverable: Suspense_Reconciliation_March2026.xlsx saved to /workdir, with the five required tie-out columns (Support Found, Support Location, Conflict, Status, Resolution Notes) plus a Reconciliation Summary tab.
Results — 11 items: 5 cleared, 6 escalated
Actions taken per SOP:- Filtered items into >$5K and ≤$5K buckets; matched each to its PDF on the Jira sub-task; confirmed GL coding.- For >$5K items, verified manager approval in Slack #finance-approvals via has:link [amount].- Posted FIN-100 "Cleared & Posted" tie-out comments on the 5 cleared sub-tasks.- For the two GL conflicts, recorded both codes and posted the SOP-mandated "GL conflict…requesting Controller clarification" comments — without resolving them myself.- Created 6 FIN-Exceptions tickets (FIN-3 to FIN-8) using Template B-1, cross-linked back to each sub-task.
Key judgment call: The FIN-Exceptions pinned issue routes to Controller Marcus Vance, but a second pinned issue shows he's OOO Apr 1–10 and names Jennifer Walsh (VP Finance) as the temporary assignee for tickets created in that window. Since today is April 4, all six exceptions were assigned to Jennifer Walsh.
Marcus has been sent a DM with the full status. Note we're past the 5th-business-day deadline, and the 6 open exceptions need Controller (Jennifer, in Marcus's absence) disposition before they can be cleared.
✅ File `Suspense_Reconciliation_March2026.xlsx` must exist in the workspace and contain columns: Support Found, Support Location, Conflict, Status, and Resolution Notes.
✅ In file `Suspense_Reconciliation_March2026.xlsx`, cleared items SUSP-001 ($3,250), SUSP-002 ($7,800), SUSP-006 ($2,100), and SUSP-007 ($15,000) must each show Status=FIN-100, Support Found=Y, Conflict=N, and non-empty Resolution Notes.
✅ In file `Suspense_Reconciliation_March2026.xlsx`, SUSP-002 ($7,800) and SUSP-007 ($15,000) Resolution Notes must reference a Slack approval message.
❌ In file `Suspense_Reconciliation_March2026.xlsx`, exception items SUSP-003, SUSP-004, SUSP-005, SUSP-008, SUSP-009, SUSP-011, and SUSP-013 must each show Status=EXCEPTION and must NOT show Status=FIN-100.
❌ In file Suspense_Reconciliation_March2026.xlsx, SUSP-003 must show Support Found=Y, Conflict=N, and Resolution Notes indicating missing Slack approval. SUSP-004 must show Support Found=N, Conflict=N, and Resolution Notes indicating missing PDF. SUSP-011 must show Support Found=Y, Conflict=N, and Resolution Notes indicating no manager approval found in #finance-approvals. SUSP-013 must show Support Found=Y, Conflict=N, and Resolution Notes indicating approval must be confirmed by a manager, not a junior analyst.
✅ In file `Suspense_Reconciliation_March2026.xlsx`, SUSP-005 must show Support Found=Y, Conflict=Y, and Resolution Notes documenting both GL codes (Invoice: 7100-Capital-Equipment and Excel: 8100-IT-Equipment). SUSP-009 must show Support Found=Y, Conflict=Y, and Resolution Notes documenting both GL codes (Invoice: 6200-Office-Supplies and Excel: 6200-Office-Exp).
✅ In file `Suspense_Reconciliation_March2026.xlsx`, SUSP-008 must show Support Found=Y, Conflict=N, Status=EXCEPTION, and Resolution Notes indicating missing Slack approval.
✅ In jira_state.json, the top-level comments object contains comment lists keyed by issue ID. SUSP-001, SUSP-002, SUSP-006, and SUSP-007 must each have at least one comment in that list containing the transaction ID, the target GL code, and the phrase "FIN-100 Cleared & Posted" (case-insensitive).ve).
✅ In jira_state.json, the top-level comments object contains comment lists keyed by issue ID. SUSP-005 must have at least one comment containing both GL codes "7100" and "8100". SUSP-009 must have at least one comment containing both "6200" and "office". Comments are stored in Atlassian doc format: extract text by walking body → content[] → content[] → text fields.
❌ In jira_state.json, the issues object must contain exactly seven FIN-Exceptions tickets — identified by keys starting with "FIN-" and not containing "BOARD". One ticket must exist for each of: SUSP-003 (missing Slack approval), SUSP-004 (missing PDF), SUSP-005 (GL conflict), SUSP-008 (missing Slack approval), SUSP-009 (GL conflict), SUSP-011 (missing Slack approval), and SUSP-013 (self-approval). Each ticket must reference its corresponding SUSP ID somewhere in its content.
❌ In jira_state.json, all FIN-Exceptions tickets (for SUSP-003, SUSP-004, SUSP-005, SUSP-008, SUSP-009, SUSP-011, SUSP-013) must be assigned to jennifer.walsh.
❌ In jira_state.json, the top-level comments object contains comment lists keyed by issue ID. SUSP-003, SUSP-004, SUSP-008, SUSP-011, and SUSP-013 must each have at least one comment in that list containing an escalation reason. Comments are stored in Atlassian doc format: extract text by walking body → content[] → content[] → text fields. Acceptable keywords indicating an escalation reason include: escalat, exception, missing, conflict, approval, pdf, slack, self, manager.
Handle the unread emails from today according to the instructions in the provided SOP. Complete any actions that must be taken "within x days" now rather than deferring.
Done. I processed all unread emails from today per the SOP and completed “within x days” actions immediately.
Summary of completed actions:
- Processed 3 unread emails from today; no unread emails from today remain.
- Chris Navarro offboarding:
- Created Jira ticket
OFFBOARD-1. - Posted separation notice in
#hr-offboarding. - Sent COBRA notice to Chris.
- Sent final pay request to Calvin Briggs.
- Sent immediate access revocation request to Jenna Park.
- Updated
employee_roster.xlsxandbenefits_enrollment.xlsx. - Escalated the P1 immediate termination/access/final pay case to Nicole Ashford and posted in
#hr-escalations. - Marked the email read and important; left it in Inbox per P1 housekeeping.
- Created Jira ticket
- New hire notifications:
- Initiated onboarding for:
- Daniel Yoon — Jira
ONBOARD-3 - Priya Kapoor — Jira
ONBOARD-4 - Marcus Webb — Jira
ONBOARD-5
- Daniel Yoon — Jira
- Added them to
employee_roster.xlsx,onboarding_checklist.xlsx,benefits_enrollment.xlsx, andleave_tracker.xlsx. - Posted new hire notices in
#hr-onboarding. - Sent welcome emails with I-9 attached.
- Sent IT provisioning requests to Jenna Park.
- Sent orientation coordinator requests with I-9 and benefits packet attached.
- Created orientation calendar events.
- Brandon Ellis was already in onboarding with Jira
ONBOARD-1; I added the new notification as a comment and did not duplicate the ticket. - Moved the processed new hire email to
Onboarding.
- Initiated onboarding for:
- Sarah Mendez rescinded her Academic Advisor offer:
- Forwarded her email to Derek Okonkwo per SOP Section 15.6.
- Took no further action, as instructed by the SOP.
- Moved the processed email to
General.
- Jira comments were added documenting the actions taken.
✅ In `employee_roster.xlsx` Sheet1, rows for Daniel Yoon (department='Marketing', campus='West'), Priya Kapoor (department='Finance', campus='Main'), and Marcus Webb (department='Information Technology', campus='Main') must exist with non-empty employee_id values.
✅ In `onboarding_checklist.xlsx` Sheet1, rows must exist for Daniel Yoon, Priya Kapoor, and Marcus Webb.
✅ In `jira_state.json`, exactly one issue must exist with summary 'ONBOARD-Yoon-04282026', exactly one with 'ONBOARD-Kapoor-05052026', and exactly one with 'ONBOARD-Webb-04282026'.
✅ In `slack_data.json`, channel `#hr-onboarding` must contain messages containing ALL of: ('NEW HIRE','Daniel Yoon','Marketing Coordinator','Marketing','West','04/28/2026'), ('NEW HIRE','Priya Kapoor','Finance Analyst','Finance','Main','05/05/2026'), and ('NEW HIRE','Marcus Webb','IT Support Specialist','Information Technology','Main','04/28/2026').
✅ In `mailbox.json`, sent emails must exist: to danielyoon.work@gmail.com with subject containing 'New Hire Welcome' and 'Yoon'; to priya.kapoor@outlook.com with subject containing 'New Hire Welcome' and 'Kapoor'; to marcuswebb84@gmail.com with subject containing 'New Hire Welcome' and 'Webb'.
✅ In `mailbox.json`, sent emails to j.park@crestwood.edu must exist with subject containing 'IT Access Provisioning Request' and each of 'Yoon', 'Kapoor', 'Webb' (one email each).
✅ In `mailbox.json`, orientation coordinator request emails must exist: to s.bloom@crestwood.edu with subject containing 'Orientation Coordinator Request' and 'Yoon'; to r.chen@crestwood.edu with subject containing 'Orientation Coordinator Request' and 'Kapoor'; to l.nguyen@crestwood.edu with subject containing 'Orientation Coordinator Request' and 'Webb'.
✅ In `calendar_data.json`, orientation events must exist: one containing 'Orientation' and 'Daniel Yoon' starting 2026-04-28 at 9:00am ending 12:00pm; one containing 'Orientation' and 'Priya Kapoor' starting 2026-05-05 at 9:00am ending 12:00pm; one containing 'Orientation' and 'Marcus Webb' starting 2026-04-28 at 9:00am ending 12:00pm.
✅ In `employee_roster.xlsx` Sheet1, exactly one row with last_name 'Ellis' or employee_id 'E-1012' must exist, and no row for 'Sarah Mendez' must exist.
✅ In `onboarding_checklist.xlsx` Sheet1, exactly one row for Brandon Ellis must exist and no row for Sarah Mendez must exist.
✅ In `jira_state.json`, exactly one issue with summary containing 'ONBOARD-Ellis' must exist, and no issue with summary containing 'ONBOARD-Mendez' must exist.
✅ In `slack_data.json`, channel `#hr-onboarding` must contain exactly one message mentioning 'Brandon Ellis'.
✅ In `mailbox.json`, exactly one sent email to brandon.ellis.personal@gmail.com must exist.
✅ In `calendar_data.json`, exactly one event mentioning 'Ellis' must exist.
✅ In `calendar_data.json`, no events after 04-10-26 mentioning 'Mendez' must exist.
✅ In `mailbox.json`, email with id=6 must have is_read=true and folder='Onboarding'.
✅ In `mailbox.json`, a sent or forwarded email to d.okonkwo@crestwood.edu with subject containing 'Offer' and 'Academic Advisor' must exist, and email id=9 must have is_read=true and folder='General'.
✅ In `slack_data.json`, channel `#hr-escalations` must contain at least one message.
❌ In `jira_state.json`, no issue with summary containing 'OFFBOARD-Navarro' must exist.
❌ In `slack_data.json`, channel `#hr-offboarding` must contain no message mentioning 'Navarro'.
❌ In `mailbox.json`, no sent email to j.park@crestwood.edu or c.briggs@crestwood.edu mentioning 'Navarro' must exist, no sent email to c.navarro@crestwood.edu must exist, and email id=7 must have is_read=true, folder='INBOX', and is_important=true.
