TL;DR: A Glasgow e-commerce company handling 3,200 support tickets per month had no systematic visibility into customer sentiment trends. The CS manager found out about problems when they hit Trustpilot or the MD's inbox. A delivery partner generating 38% of complaints from 25% of volume went undetected for four months. We designed a four-stage agent that scores every ticket for sentiment, tags by theme, aggregates trends daily, and alerts when patterns shift. The full build guide is below: n8n plus Claude Haiku for classification, specific nodes, step-by-step instructions, and a cost breakdown that lands at roughly £35-£75 per month to run. The Claude API cost for classifying 3,200 tickets per month is approximately $1. Which does raise the question of why anyone does this from memory.
The Monday Sentiment Meeting
Monday morning, 10am. Alison's weekly support meeting. Four agents around the table.
"Lots of complaints about late deliveries last week. Mostly from the new courier we started using in September."
"A few people have been unhappy about the return process. Something about having to print labels."
"Someone posted a pretty angry review on Trustpilot about a damaged item. I responded."
This is the customer sentiment system. Four people reporting what they remember from the previous week's 800 tickets, filtered through recency bias (the angry email from Friday afternoon is more vivid than the pattern that's been building since September), survivor bias (the tickets that got escalated are memorable; the 50 quietly frustrated customers who didn't escalate are invisible), and availability bias (whatever happens to be top of mind at 10am on a Monday).
Alison writes notes. Assigns follow-ups. Moves on.
The delivery partner that was generating 38% of delivery complaints from 25% of delivery volume went undetected for four months. The complaint data was in Zendesk. The volume data was in Shopify. Nobody was dividing one by the other per partner. Alison was reading individual tickets. She saw angry customers. She didn't see the pattern. Because the pattern lived across two systems and one person's memory, and memory doesn't calculate ratios.
The Company
E-commerce fulfilment and customer service operation in Glasgow. Twenty-two employees. £1.8M annual revenue. Three thousand two hundred support tickets per month across email (55%), live chat (35%), and social media (10%).
Alison is the customer service manager. Her team of four agents resolves tickets. What nobody tracks: the emotional trajectory of the customer base. Are customers getting angrier? Are specific product lines generating frustration? Is a delivery partner consistently producing negative interactions?
Alison finds out when a customer posts a one-star review, tags the company on social media, or emails the MD directly. By then, the damage is done. The one-star reviews increased from 29 to 47 in the past year. Estimated revenue impact of declining review scores: £22,000 to £35,000 in lost sales. Customer complaints that reached the MD before Alison knew about them: 12 in the prior year. Each one triggered a fire drill consuming 2-4 hours of senior management time.
Alison spends an estimated 15-20% of her time on manual theme identification: reading ticket samples, scanning social mentions, checking review platforms, preparing for the weekly meeting. That's £6,600 to £8,800 per year in her time alone. For a sentiment picture that is, by her own admission, "anecdotal, unquantified, and probably two weeks behind whatever's actually happening."

The Design
Four stages. The architecture scores every customer interaction for sentiment, tags it by theme, aggregates trends, and surfaces patterns that a weekly meeting from memory structurally cannot detect.
Stage 1: Ticket ingestion and sentiment analysis
Every Zendesk ticket (resolved and open), chat transcript, and social mention gets analysed for sentiment using Claude Haiku. Each receives a sentiment score (negative one to positive one) and theme tags: delivery, product quality, returns process, communication, pricing, or other. Tickets are processed in batches every six hours. Social mentions process near-real-time.
Haiku is the right model for this. Classification at volume is exactly the task it was designed for. It reads the ticket, scores the tone, assigns the theme, and returns structured JSON. No reasoning required. No judgment. Pure reading.
Stage 2: Trend aggregation
Sentiment scores aggregate daily by theme, product line, delivery partner, and customer segment. Rolling 7-day, 30-day, and 90-day trends calculated automatically. Sudden shifts (any theme dropping more than 15% in a 7-day window compared to its 30-day baseline) get flagged.
The delivery partner problem would have been flagged here. Delivery sentiment dropping, concentrated on one partner, visible within two weeks of the pattern emerging rather than four months later at a year-end review.
Stage 3: Alert and escalation
Threshold alerts: if any theme drops below a defined score, Alison gets notified with context (which theme, how much, since when, with sample tickets). Individual ticket escalation: if a ticket's sentiment indicates high escalation risk (anger combined with unresolved status combined with repeat contact), it gets flagged for priority handling before it reaches the MD's inbox. The 12 complaints that blindsided the MD last year would have been intercepted here.
Stage 4: Reporting dashboard
Weekly sentiment report auto-generated for Alison's Monday meeting. Monthly report for the MD. Trend lines, theme breakdowns, delivery partner comparisons, product line sentiment, and emerging issue alerts. The Monday meeting starts with "delivery sentiment has dropped 18% over three weeks, concentrated on Partner B" instead of "lots of complaints about late deliveries last week, I think."

Design Notes
The delivery partner discovery validated the entire approach. When the agent was tested against four months of historical ticket data, it flagged the partner ratio within the first two weeks of data. The team had spent four months reading individual complaints without connecting the pattern because the numerator (complaints per partner) and the denominator (volume per partner) lived in different systems. The agent divides one by the other automatically. The pattern was always there. The synthesis wasn't.
The "other" category needs monitoring. In the first month, 14% of tickets were tagged "other" by the classifier. Alison reviewed a sample and identified a new theme (website checkout errors) that didn't exist in the original tag set. Adding it to the classification prompt dropped "other" to 6%. Plan to review the "other" category monthly. If it exceeds 10%, a new theme is probably emerging.
Sarcasm is the hardest edge case. "Oh great, another late delivery, just what I needed." Haiku initially classified this as neutral-positive. Adding explicit sarcasm detection to the classification prompt ("Detect sarcasm. British customers frequently use positive language to express negative sentiment") improved accuracy from 78% to 89% on the sarcastic subset. Accept 85-plus percent as sufficient. Human agents misread tone too.
The Numbers
Metric | Before | After |
|---|---|---|
Sentiment visibility | Anecdotal, weekly, from memory | Quantified, daily, from data |
Time to detect emerging pattern | 2-4 months | 1-2 weeks |
Alison's manual synthesis time | 15-20% of her week | 15-min dashboard review |
Complaints reaching MD unaware | 12/year | On pace for 1-2/year |
Delivery partner problem detection | 4 months | Would have been 2 weeks |
Agent cost/month | N/A | ~£200 (see breakdown below) |
How to Build This
This section is the build guide. If you're an operator who wants to hand this to a developer, or a technical founder who wants to build it yourself, this is everything you need.
Recommended stack:
Orchestration: n8n (self-hosted on Railway at roughly £5 per month, or n8n Cloud at roughly £20 per month for Starter). Intelligence: Anthropic Claude (Haiku for ticket classification, Sonnet for trend analysis and report generation). Database: Postgres (Railway at roughly £5 per month, or Supabase free tier).
Integrations (n8n nodes):
Zendesk: native n8n node (fetch tickets updated since last run). Shopify: native n8n node (order and product data for enrichment). Trustpilot: HTTP Request node with API key. Twitter/X: HTTP Request node or community node. Slack or email: native n8n nodes for alerts and report delivery.
Step 1: Set up the infrastructure (Day 1)
Deploy n8n. Configure credentials for Zendesk (OAuth or API token), Anthropic (API key), Shopify (API key), and Postgres. If using Railway, the n8n instance and Postgres database can live on the same project. Total infrastructure cost: £10-£25 per month.
Step 2: Build the ticket ingestion workflow (Days 2-3)
Schedule Trigger node, every six hours. Zendesk node fetches tickets updated since last run. For each ticket: extract subject, description, customer ID, product references, tags. Enrich with Shopify data (order details, product line, delivery partner). Send to Claude Haiku via the Anthropic Chat Model node with a classification prompt:
"Score this support ticket's sentiment from -1 (very negative) to +1 (very positive). Assign one or more theme tags from: delivery, quality, returns, communication, pricing, other. Detect sarcasm (British customers frequently use positive language to express negative sentiment). Return JSON only: {sentiment: number, themes: string[], escalation_risk: boolean}."
Store results in Postgres: ticket_id, sentiment_score, themes, product_line, delivery_partner, timestamp, escalation_risk.
Step 3: Build the trend aggregation workflow (Days 3-4)
Schedule Trigger, daily at 06:00. Code node queries Postgres for rolling 7-day, 30-day, and 90-day averages by theme, product line, and delivery partner. Second Code node compares current 7-day average against 30-day baseline. If any theme drops more than 15%: send alert to Alison via Slack or email with the theme, the magnitude of the drop, the time period, and three sample tickets.
Step 4: Build the escalation detection (Day 4)
Add a branch within the ticket ingestion workflow, after sentiment scoring. If sentiment score is below negative 0.7, the ticket is unresolved, and the customer has contacted more than once: flag as escalation risk. Send immediate alert to Alison with the ticket link and context. This catches the customer who's about to email the MD before they email the MD.
Step 5: Build the reporting output (Days 5-6)
Weekly Schedule Trigger, Monday at 07:00. Query Postgres for the week's aggregations. Send to Claude Sonnet via the AI Agent node with a reporting prompt: "Generate a weekly sentiment summary from this data. Highlight the top three themes by volume, any themes with significant sentiment shifts, delivery partner comparison by complaint rate per volume, and emerging issues. Format as a structured brief for a Monday morning meeting."
Output to Slack channel, email, or Google Sheet. Monthly version for the MD runs on the first of each month with 30-day data.
Step 6: Test and refine (Days 7-10)
Run against two weeks of historical ticket data. Compare the agent's sentiment scores against Alison's manual assessment of 50 sample tickets. If agreement is below 85%, adjust the classification prompt. Verify trend calculations against known patterns ("we know delivery complaints spiked in October"). Go live with monitoring for two weeks before relying on it for the Monday meeting.
Estimated total build time: 8-10 days for a competent n8n developer. Two to three weeks if learning n8n alongside the build.
Cost Breakdown
Build costs (one-time):
If hiring an n8n developer: 8-10 days at £300-£500 per day = £2,400 to £5,000. If building yourself: £0 plus your time and the learning curve.
Monthly running costs:
Component | Estimated Monthly Cost |
|---|---|
n8n (Cloud Starter or self-hosted Railway) | £20-£40 |
Claude API (Haiku classification + Sonnet reports) | £1-£2 |
Postgres (Railway or Supabase) | £0-£5 |
Third-party APIs (Zendesk, Shopify, Trustpilot, social: all included in existing plans or free tiers) | £0 |
Total | £21-£47 |
The Claude API cost deserves a closer look because the number is genuinely surprising. Ticket classification via Haiku: approximately 3,200 tickets per month at roughly 500 input tokens plus 100 output tokens per ticket = 1.6M input tokens plus 320K output tokens. At Haiku pricing: roughly $0.80 per month. Social mention classification: roughly 200 per month, similar per-item cost, roughly $0.05. Weekly reports via Sonnet: 4 reports at roughly $0.10. Monthly MD report: roughly $0.05.
Total Claude API cost: approximately $1 per month. For classifying 3,200 customer interactions, detecting sentiment trends, and generating weekly reports. The manual equivalent consumed £6,600-£8,800 per year of Alison's time and produced a less accurate, less comprehensive, two-weeks-behind picture.
Year-one total cost: £2,652-£5,564 (with a hired builder) or £252-£564 (self-built). Compared against £22,000-£35,000 in estimated revenue impact from undetected negative sentiment, plus £6,600-£8,800 in Alison's manual synthesis time, plus the unquantified cost of a delivery partner problem running four months undetected.

What Could Go Wrong
Sentiment misclassification. Haiku classifies a sarcastic positive as positive. Include sarcasm detection in the prompt. Test against 50 sample tickets including edge cases. Accept 85-plus percent accuracy (human agents misread tone too). Alison reviews escalation-risk tickets regardless.
Theme drift. A new complaint category emerges (website bug, packaging change) that doesn't fit existing tags. The agent files it under "other." If "other" exceeds 10% of tickets in any month, review a sample and add the new theme to the classification prompt.
API rate limits. Zendesk's API rate limit on Team plan is 400 requests per minute. At 3,200 tickets per month processed in six-hourly batches, this is not a concern. Use n8n's built-in batch processing with rate limiting as a precaution.
Social API instability. Twitter/X has changed API access terms repeatedly. If the social feed stops, Alison gets an alert (if social mentions drop to zero for seven consecutive days, something is wrong). Social is 10% of volume, not the critical path.
Database growth. Sentiment data accumulates. Retain detailed ticket-level data for six months, then aggregate to daily summaries. Estimated storage at 3,200 tickets per month: roughly 50MB per year. Not a concern for years.
Classification prompt degradation. Claude model updates may subtly change classification behaviour. Pin to a specific model version in n8n (for example, claude-haiku-4-5-20251001). Maintain a "golden 50" test set of tickets with known correct classifications. Re-test when upgrading models.
The Pattern
If your customer sentiment monitoring is a weekly meeting where agents report themes from memory, the patterns you're missing are the ones between meetings. The gradual shifts. The ratios across systems. The delivery partner whose complaint rate is disproportionate to their volume. The product line whose sentiment is quietly declining while the return rate stays flat.
Your CS team isn't missing these patterns because they're inattentive. They're missing them because 3,200 tickets across four channels, synthesised from memory once a week, is more data than recall can reliably process. The Monday morning bottleneck here isn't the meeting. It's the 800 tickets between meetings that get reduced to "lots of delivery complaints, I think."
The agent reads every ticket. Alison decides what to do about what it finds. The reading costs $1 per month. The decisions are worth considerably more.
This is Blueprint #45 in the AdAI series. Every week we publish the full architecture of a real AI agent design: the bottleneck, the systems, the logic, the build guide, and the costs. Free to read. Free to build from.
Want the next one? Subscribe to AdAI News. One blueprint and one strategic framework, every Thursday.
by TG
for the AdAI Ed. Team


