TL;DR: A 24-person digital marketing agency in New York had a 14-page expense policy with 12 core rules and 47 documented exceptions. The finance director enforced it from memory. She caught about 60% of violations. The rest got reimbursed and discovered months later, or never. Annual cost of non-compliance: roughly $27,000-$32,000. They'd tried Expensify (couldn't handle exceptions), a simplified policy (HR pushed back), and training sessions (minimal retention). We built a four-stage agent that encodes the full policy as a decision tree, lets employees check compliance before submitting, reviews every line automatically, and dashboards violation patterns. Violation rate: 12-15% down to 2.1%. Rachel's review time: 2-3 hours per week down to 25 minutes. Agent cost: $190/month.

The Hotel and the Team Dinner

Rachel opened the expense report. Hotel charge: $412 per night. New York in October. Plausible. But was this a client meeting (allowance: $400 per night) or an internal trip (limit: $250 per night)?

Description: "NYC: client meeting + team dinner." Client rate applies. But the team dinner. Was that under the meal policy ($75 per person) or the client entertainment policy ($150 per person)? The client wasn't at dinner. But it was after a client meeting. Same trip. Different meal. The policy didn't explicitly address this.

Rachel made a judgment call. Approved it. Moved on. Seventy-nine more lines to go this week.

This is, when you think about it, a system whose enforcement mechanism is a single human's ability to recall the contents of a 14-page document with 47 contextual exceptions while simultaneously reviewing 80 line items per week. Which is to say, it's not an enforcement mechanism at all. It's a memory test administered weekly to a finance director who is also doing forecasting, payroll, vendor management, and quarterly reporting.

Rachel estimates she catches about 60% of violations. She thinks. The other 40% get reimbursed and show up in quarterly reviews (months after the money left the account) or don't show up at all. Last quarter's review found $4,200 in violations already paid out.

The irony wasn't subtle. A marketing agency (a business that builds systems and processes for clients, for money, professionally) had no system for its own expenses. Just a PDF nobody reads and a finance director working from recall.

The Agency

Digital marketing agency in New York. Twenty-four employees. SEO, paid media, content strategy, analytics. Annual revenue: roughly $3.2M. Monthly expense volume: $28,000 to $35,000.

The expense policy: 14 pages. Twelve core rules. Forty-seven documented exceptions.

I want to be specific about the exceptions, because the number sounds abstract until you see what it means in practice. Hotel limit: $250 per night. Unless it's a client meeting in a Tier 1 city ($400). Unless it's a conference with pre-approval (actual rate). Unless it's international (separate tier schedule). One rule. Four branches. Two of them subjective. Multiply that pattern across meals, travel, software, entertainment, and client gifts, and you arrive at 47 exceptions to 12 rules, which is roughly four exceptions per rule, which is roughly the point at which a "rule" stops being a rule and starts being a suggestion with extensive footnotes.

Rachel reviews every expense report manually. Fifteen to twenty minutes per report. Eight to twelve reports per week. Two to three hours of her week spent cross-referencing reality against a document she can recall about 70% of accurately.

Annual violations caught: roughly $16,800. Estimated uncaught: $11,000 to $15,000. Total annual cost of non-compliance: somewhere between $27,000 and $32,000.

Each violation is small enough to feel like rounding error. A $50 overage here. A $120 meal there. Individually manageable. Collectively, a mid-level employee's salary. Paid out in increments too small to notice and too numerous to track.

Why Nothing They Tried Worked

The PDF was well-written. Nobody reads it. Rachel tested this informally: she asked the team to name the meal limit for a non-client business dinner. Three of 24 employees got it right. This is not because the other 21 are irresponsible. It's because nobody memorises a 14-page document they reference twice a year, which is a fact so obvious it barely warrants stating, and yet the entire enforcement system was built on the assumption that they would.

They implemented Expensify. Good tool. Couldn't handle exceptions. Setting a $250 hotel limit flagged every client-meeting hotel, every conference stay, every international trip. Within two months, Rachel was overriding 40% of alerts. Alert fatigue won. She stopped checking the flags. They turned it off. Eighteen months of subscription fees for a system that made the problem noisier without making it smaller.

HR proposed simplifying the policy. Account directors pushed back. A $75 meal limit for a Manhattan client dinner is unreasonable. Simplification would either punish employees or remove guardrails entirely. Both outcomes are worse than the current mess, which is a rather depressing observation about the current mess.

Quarterly training sessions produced quarterly nodding and weekly forgetting. Managers asked to pre-approve expenses approved everything (they didn't know the exceptions either, which, again, would require memorising 14 pages, which again, nobody does).

Every approach failed for the same reason: the policy was right but complex. The tools couldn't handle the complexity. The humans couldn't remember it. Simplifying would break it. The agency was stuck between a policy too complex to enforce and too correct to simplify.

What We Built

Four stages. The core insight: don't simplify the policy. Make the complex policy enforceable.

Stage 1: Policy ingestion

The full 14-page policy, every rule, every exception, every contextual condition, encoded into a decision tree. Not a document to reference. A decision engine that applies automatically. Every branch. Every condition. Every one of the 47 exceptions. Applied consistently, every time, without requiring Rachel to remember Exception 4.3.2 at 4 PM on a Thursday.

And here's what the encoding process revealed: about 15 of the exceptions had implicit sub-conditions that weren't in the PDF. Rachel knew them. Nobody else did. "Client entertainment meals at enterprise tier" carried an unwritten qualifier: the account director had to approve the dinner in advance. We spent a full day extracting rules the PDF doesn't even contain. The document that supposedly governed expense behaviour turned out to be an incomplete summary of the actual rules, which lived in Rachel's head. I don't want to suggest that's an unusual situation, but I will observe that it rather undermines the concept of a written policy.

Stage 2: Pre-submission guidance

Before employees submit an expense, they can check: "Is this within policy?" The agent asks two or three contextual questions and confirms whether the expense qualifies and under which exception.

The employee knows before they submit, not after Rachel reviews it. This is the sequence Expensify got backwards: it flagged after submission (too late, money's already mentally spent) instead of guiding before (the moment when the employee can actually adjust).

Stage 3: Automated review

Every submitted line gets reviewed against the full policy including all exceptions. Items get flagged by severity: hard violation (over the applicable limit, no exception applies, red flag), missing context (might qualify but the submitter didn't provide enough detail, amber flag), or unusual pattern (within policy but atypical, yellow flag, worth a glance).

Clean items auto-approve and log. Rachel reviews the flagged items. Not all 80 lines. The five or six that actually need a human.

Stage 4: Compliance dashboard

Monthly view: total expenses, violation rate, top violation categories, recurring employee patterns. And the insight that changed the policy itself: which sections generate the most confusion. If Section 4.3 accounts for 35% of all violations and 60% of all pre-submission questions, that's not an enforcement problem. That's a policy design problem.

What We Learned Building It

The pre-submission guidance was the breakthrough nobody expected. We built it as a convenience. It became the primary enforcement mechanism. Sixty-seven expenses in Q1 were checked and adjusted before submission. Sixty-seven violations caught before reimbursement rather than months after (or never). The culture shifted from "gotcha" to "guidance." Employees stopped seeing the expense policy as a trap and started seeing it as something they could navigate. Which is, when you think about it, what a policy is supposed to be.

The dashboard drove policy reform. Three sections generated 60% of all confusion. The founders used the data to consolidate those sections. Exceptions reduced from 47 to 38. Not because the exceptions weren't valid, but because some could be merged. Cleaner. Fewer judgment calls.

Expensify wasn't wrong. It was aimed at the wrong version of the problem. The agency spent 18 months blaming the tool. The tool was fine. Asking Expensify to enforce this particular expense policy is a bit like asking a thermometer to control the weather: it can tell you the temperature, but it has no mechanism for doing anything about it. The agent added the contextual intelligence Expensify was never designed to provide.

Not everyone appreciated the new system, incidentally. Three employees initially grumbled about the pre-submission check ("I shouldn't have to justify a coffee"). Within a month, 19 of 24 said it was "actually useful." The two who remained unhappy were, by Rachel's observation, the two with the highest historical violation rates. I'll let you draw your own conclusion.

The Numbers

Metric

Before

After

Violation rate

12-15%

2.1%

Pre-submission violations caught

0 (no pre-check existed)

67 in Q1

Rachel's review time/week

2-3 hrs

25 min

False flag rate

N/A

8% month 1, 3% month 3

Policy exceptions

47

38 (simplified from data)

Agent cost/month

N/A

$190

Estimated annual savings: $22,000 to $28,000. Prevented violations, eliminated quarterly catch-up reviews, and Rachel's recovered time redirected to financial planning work that actually moves the business forward.

$190 per month. $2,280 per year. Against $22,000 to $28,000 in savings. I'll leave you to do the arithmetic.

But the number Rachel keeps coming back to isn't the savings. It's the sequence. Violations used to be caught after reimbursement (which is to say, after the company had already paid for them, making the word "caught" rather generous). Now they're caught before submission. Rachel stopped being a detective reviewing crime scenes and started being a finance director. Which is, if I'm not mistaken, the job she was hired for.

The Pattern

If you have an expense policy in a PDF nobody reads, enforced by one person from memory, your violation rate is higher than you think. And catching violations after reimbursement isn't really catching them. It's cataloguing losses. The money already left. You're not enforcing a policy. You're conducting an autopsy.

The agent doesn't simplify the policy. It makes the complex policy enforceable. Every rule, every exception, applied consistently, every time. Rachel handles the judgment calls. The agent handles the 47 exceptions she was carrying in her head.

This applies anywhere a policy exists on paper but lives in one person's memory. Expense rules, procurement thresholds, approval workflows, compliance requirements. If the document has more than 10 rules and the person enforcing it can't recite them from memory (which, to be clear, they can't, because that's not how memory works), the gap between policy and practice is costing you money. You just haven't done the quarterly review that reveals it.

Want to see if your policies have enforcement gaps? The AI Workflow Diagnostic takes 10-15 minutes and shows you where the gaps between policy and practice are costing you.

Want to see 25 agent architectures across different industries? Download Unstuck. It includes blueprints for cash flow, documentation, pipeline tracking, outreach, and more.

by BM
for the AdAI Ed. Team

Keep Reading