TL;DR: Most AI implementations fail not because the technology is wrong but because the target is wrong. Three failure modes account for nearly every disappointment: automating the wrong task (picking what's annoying instead of what's describable), automating in the wrong sequence (skipping documentation and jumping to automation), and automating with the wrong expectation (expecting 100% automation or requiring humans to change behaviour they won't). The fix isn't better technology. It's better targeting. Five questions, asked before you build, predict whether the project will succeed or waste money.
The Tool That Worked Fine
A New York marketing agency implemented Expensify to manage employee expenses. Good tool. Well-reviewed. Widely used. It failed in 18 months.
The agency had an expense policy with 12 core rules and 47 exceptions. Hotel limit: $250 per night. Unless it's a client meeting. Unless it's a conference. Unless it's international. Expensify enforced flat limits. It couldn't distinguish between an account manager exceeding the cap on a personal trip and the same account manager exceeding the same cap because she was meeting a client in Manhattan, which is a rather important distinction and one that the agency's actual policy had documented in detail on page seven.
Within two months, the finance director was overriding 40% of alerts. Alert fatigue won. She stopped checking the flags. They turned it off.
The agency's conclusion: "Expensify doesn't work for us."
The actual diagnosis: Expensify worked fine. They'd aimed it at the simple version of the problem (flag anything over $250) instead of the real version (apply 47 contextual exceptions to 12 rules consistently). The tool hit the target it was given. The target was wrong.
I've been observing this pattern for long enough now that I find it rather troubling, because it recurs with a consistency that suggests something structural rather than coincidental. The technology gets blamed. The technology is almost never the problem. The aim is. And the aim is almost always off in one of three predictable ways.
Wrong Target
The first failure mode is the most common and the most avoidable.
The business picks a task to automate because it's annoying, not because it's describable. "We hate expense reports" is a perfectly reasonable starting point for a conversation. It's a terrible starting point for an implementation. Because the question isn't "do we hate this?" It's "can you describe, step by step, what a human does to complete this task?" If the answer includes "it depends" more than twice, you're not ready. The task needs documentation before it needs automation.
The NYC agency hated expense reviews. Understandable. Rachel was spending 2-3 hours a week on them. So they automated enforcement with flat limits. But the actual enforcement process had 47 contextual exceptions that nobody had encoded anywhere. The tool made decisions based on incomplete rules, which is precisely the sort of thing that produces confident, consistent, wrong answers.
The Austin marketing agency made a different version of the same mistake with cold outreach. Their instinct: use AI to write better emails. The AI produced articulate, confident, empty messages. Reply rates dropped from 2.2% to 1.8%. The AI was aimed at the writing step. The actual bottleneck was the research step. When they automated research instead of writing, reply rates climbed to 8.7%. Same technology. Different aim. Completely different outcome.
Wrong target doesn't mean the task shouldn't be automated. It means the task hasn't been described precisely enough for automation to work. The NYC agency eventually built an expense enforcer that succeeded, but only after every exception was encoded into a decision tree. Same task. Same people. Better aim.

Wrong Sequence
The second failure mode is subtler. Right task. Wrong order.
The sequence that works is always: document the process, standardise it, fix what's broken, then automate. Most businesses skip steps one through three and jump directly to four. The automation faithfully reproduces whatever mess existed before it arrived. Just faster. Which is, I suppose, a kind of improvement, in the same way that a car with a broken steering wheel is improved by a bigger engine.
The Portland accounting firm's engagement letters are the clearest example. If they'd automated the copy-paste method without adding a quality check stage, they'd have automated the fee errors. Four hundred and twelve letters with a 5.6% error rate, produced in an hour instead of three weeks. Faster errors. The agent succeeded because the first thing it did was compare each letter against the previous year's version. Fix the process, then automate it.
The Melbourne recruitment agency couldn't track pipeline velocity until they standardised what "progressing" meant at each stage. Each consultant tracked differently. One consultant's "screening" was another consultant's "initial contact." The agent needed consistent definitions before it could measure anything meaningful. Without the standardisation step, it would've produced a dashboard full of numbers that meant different things depending on who'd entered them, which is, it turns out, exactly what the spreadsheet was already doing.
The Nashville law firm couldn't automate conflicts checking until the partners sat down and articulated their actual decision criteria. Not the criteria they thought they used. The ones they really applied. Three sessions over two weeks. Then the agent could be built.
Document. Standardise. Fix. Automate. In that order. Skipping a step doesn't save time. It guarantees that the automation inherits whatever was broken before it arrived, and it will execute that brokenness with impressive reliability.
Wrong Expectation
Two versions of this. Both fatal.
Expecting the AI to do everything. The belief that the agent should handle the entire process with no human involvement. This produces brittle systems that break on edge cases, erode trust, and get turned off within months. Every successful build in this series has a human in the loop. The Nashville partner decides on conflicts. Karen reviews engagement letters. Chris reviews cold emails. Rachel handles flagged expenses. Marcus makes financial decisions from the forecast.
The split is remarkably consistent: roughly 80% agent (the process, the data, the routine) and 20% human (the judgment, the exceptions, the decisions that require context a system can't access). Define the human role before building. If you can't describe what the human still does in two sentences, you haven't designed the system. You've designed a fantasy.
Expecting humans to change behaviour. The Denver roofing company tried a photo tagging app. Crews were supposed to log in, select the job, and tag each photo before taking it. It lasted two weeks. Because crews are on roofs, not at desks. They're holding power tools, not browsing dropdown menus. The app required behaviour change that was fundamentally incompatible with the actual work of roofing.
The agent that succeeded didn't ask crews to do anything different. They were already texting photos. The agent just changed which number they texted them to. Everything downstream happened automatically. No new app. No login. No tagging. No behaviour change.
The Toronto general contractor tried a shared spreadsheet for sub tracking. Ray never used it. "I already know who to call." The spreadsheet required Ray to change how he worked. He didn't. The agent that succeeded extracted Ray's knowledge through two brain-dump sessions and built a system around how he already operated.
This is, I think, the single most predictable cause of AI implementation failure, and the easiest to prevent: if your implementation requires the end users to do something fundamentally different from what they're doing now, they won't. Not because they're resistant to change (the usual accusation). Because the new behaviour is incompatible with the constraints of their actual job. Design around existing behaviour. Not the behaviour you wish they had.

The Five Questions
Ask these before you spend a pound. The answers predict whether you'll save money or waste it.
Can someone describe the steps? Sit with the person who does the task. If they can walk through 5-15 clear steps, the task is automatable. If they say "it depends" more than twice, the task needs documentation first. This is the filter that catches wrong targets before they become expensive wrong targets.
Is the current process the right version? Or is it broken and you're about to automate the broken version? If the process has known errors, workarounds, or inconsistencies (and the Portland engagement letters, the Melbourne pipeline stages, and the Nashville conflicts criteria all did), fix those first. Automating a bad process doesn't fix it. It scales it.
What does the human still do? Not "everything" and not "nothing." The human reviews. Approves. Handles exceptions. Makes judgment calls that require context. If you can't describe the human's role in two sentences, the design isn't ready.
Does this require behaviour change? If the end users need to learn a new tool, follow a new workflow, or do something they weren't doing before, the implementation is at risk. The most successful agents in this series adapted to existing behaviour. Changed the output, not the input.
Does the maths work? Hours per week times hourly cost times 52 equals the annual cost of keeping it manual. If that number isn't at least 3x the agent cost, the project doesn't justify itself. A $190/month agent saving $22,000 per year is obvious. A $400/month agent saving $3,000 per year is a hobby.
Three or more confident answers: ready. Two or fewer: groundwork needed. That groundwork isn't a delay. It's the difference between joining the list of businesses that succeeded and the rather longer list of businesses that concluded "the technology wasn't right for us."
The Sentence That's Always Wrong
Every failed AI project produces the same post-mortem: "The technology wasn't right for us."
It's almost never true. The technology was fine. Expensify handles expenses. Photo tagging apps organise files. AI writing tools produce fluent copy. They all do exactly what they were designed to do.
The failure was upstream. Wrong target: automating what was annoying instead of what was describable. Wrong sequence: jumping to automation before documenting, standardising, and fixing. Wrong expectation: expecting 100% from the AI or 100% behaviour change from the humans.
The businesses in this series that got it right didn't have better technology. They had better aim. They asked the questions first. They documented the exceptions before encoding them. They designed around actual behaviour. They defined the human role before building the agent role.
The difference between a failed implementation and a successful one isn't budget, sophistication, or industry. It's whether someone stopped long enough to ask "are we aiming at the right thing?" before pulling the trigger.
Which is, when you think about it, a rather inexpensive question to ask. Certainly cheaper than 18 months of Expensify.
Want to run the five questions against your specific business? The AI Workflow Diagnostic walks you through the checklist. Takes 10-15 minutes.
Want to see 37 real stories of businesses that got the aim right? Download Unstuck. Every one started with the five questions.
by SP, CEO - Connect on LinkedIn
for the AdAI Ed. Team


