What 600 workflows taught us about when automation is the wrong answer

30-second verdict

We have built 600+ workflows. The biggest lesson is restraint, not technique. Automate a process only after it has stopped changing. Keep judgment steps with humans. Budget for maintenance before you build, because every workflow is software you now own. On most rescue engagements we delete automations before we build anything. And if your process still changes every week, you do not need automation or a consultant yet. You need a checklist and a month of doing the work by hand.

The situation

One build explains most of what 600 workflows taught us, so we will tell that one properly.

A client ran a shift-based operation. Workers asked for shifts the way people actually ask for things: free-text Slack messages. "Can I take Saturday close, or Sunday morning if that's gone." "Swap me off Thursday, family thing." No form. No structure. A coordinator read every message, checked the roster in HubSpot, checked who was qualified, checked who was already booked, and wrote the assignment by hand. Bookings ran through Shopify. Zapier moved data between the pieces.

Allocation took hours per cycle. Worse, double bookings happened. Two messages about the same shift, handled in two browser tabs, and two people show up for one slot. Someone has to make an awkward phone call, and the person sent home remembers it.

The constraint: the coordinator was the only person who knew the unwritten rules, and the rules were genuinely judgment-heavy at the edges. Who gets a contested shift. What counts as a fair swap. The brief from the client was, more or less, "make the messages turn into assignments."

What we tried first (and why it failed)

Attempt one: kill the free text. We built a tidy request form. Almost nobody used it. Workers lived in Slack, so requests stayed in Slack, and now there were two intake channels instead of one. Lesson logged: you do not fix an intake problem by adding another intake.

Attempt two: parse the messages with rules. Keyword matching on day names and location words. It broke on the first real week of traffic. "This sat" did not match Saturday. "Anything but Sunday" matched as a Sunday request. A rules engine pointed at human language is a bug factory with a delay on it.

Attempt three is the one that actually hurt. We wired the assignment automation on top of the messy intake, reasoning we could clean up parsing later. Two requests for the same shift arrived a few minutes apart. Both runs read the shift as open. Both wrote an assignment. Zapier showed green checkmarks on both runs, because both runs did exactly what they were told. That is the signature failure of bad automation: not a red error, a green checkmark on the wrong outcome.

Underneath all three attempts sat the same mistake. We were automating a process that had not stopped moving. The coordinator was still adjusting eligibility rules as edge cases surfaced, so the rules we encoded were stale almost as soon as we encoded them. We were pouring concrete on wet ground.

The thing that actually mattered

We stopped building. That was the turn.

For a stretch we did something that feels wrong when a client is paying you to automate: we ran the process by hand, with the coordinator, off a written checklist. Every time an edge case forced a checklist change, we noted it and kept going. Only when the checklist stopped changing did we automate, and we automated only the steps that had survived.

That gave us the first rule we now apply everywhere. Automate after the process is stable, not before. Automation does not improve a process. It repeats one. If the process is still finding its shape, automation repeats the wrong shape faster, at volume, with confidence.

Ownership locks, explained properly

The double booking problem needed an engineering fix, not more carefulness. The fix is old and boring, and most no-code builds skip it: a lock.

Every shift record in HubSpot got a status property. The first action of any run that wants to assign a shift is to claim it: flip the status from open to claimed, and stamp the record with an identifier for that specific run. Then re-read the record. If your stamp is the one on the record, proceed. If it is not, another run got there first, so stop and route the request to the retry lane.

On top of the lock, one rule: exactly one workflow is allowed to write the assignment field. Everything else reads. When two automations can write the same field, you do not have two automations. You have a fight. The locks ended the double bookings.

Where the AI fit, and where the humans stayed

The free-text problem is what AI is actually good for. An OpenAI step read each Slack message and returned structured fields: who, which shift, which date, which location, plus a confidence rating. Note the job description carefully. The AI classified. It never decided.

Anything low-confidence, ambiguous, or unusual dropped into a human lane in Slack for the coordinator. So did every contested shift. Across our builds, that line has held steady. These steps stay human: exceptions, money decisions outside written policy, contested calls between people, apologies, and the first occurrence of anything new. The pattern underneath: automate the steps where a wrong answer costs a retry. Keep humans on the steps where a wrong answer costs a relationship.

What shipped

The final system reads like a sentence. A Slack message comes in. The OpenAI step parses it into a structured request. The workflow claims the shift with the lock, runs the eligibility checks that came straight off the stabilized checklist, writes the assignment through the single writer, confirms back to the worker in Slack, and keeps the Shopify side in sync. Allocation went from hours to near-instant.

The coordinator did not disappear. The coordinator stopped doing data entry and started doing only the judgment work: the contested shifts, the strange requests, the exceptions. That is the correct end state. The automation took the typing, not the call.

One piece shipped that nobody asked for: an error lane. Every workflow posts its failures to a Slack channel that a human actually reads. Silent failure is the default in these tools. The alerting has to be built on purpose, on day one. This is the shape of most builds in our automation and AI work: a stable manual process underneath, AI on the parsing, locks on the writes, humans on the judgment, and alarms wired in from the start.

What we would do differently

The build worked. The lessons below are about everything around it, and they are the patterns that 600+ workflows keep confirming.

The maintenance tax nobody budgets

Every workflow you ship is software you now own. Almost nobody budgets for that. The quote covers the build, and then real life happens. An admin renames a HubSpot property and a Zap starts writing into nothing, without erroring, because as far as Zapier is concerned the step succeeded. A pipeline stage gets edited and three workflows that key on it quietly stop matching. An API version sunsets. A connected trial tool expires.

We cannot give you a universal maintenance number, because there is not one. We can give you the shape. Every automation needs a named owner, alerting that a human reads, and a registry that lists what exists, what triggers it, and what it writes to. We would build the registry first now, before workflow one. If you want to see how undocumented automations decay in the wild, our operations leak audit walks through finding them.

Why most rescue engagements start with deletion

A lot of our work is rescues. Someone inherits an account full of automations and things are misbehaving, and the expectation is that we will build more. On most of these engagements, the first real work is deletion.

We find the same things over and over. A HubSpot workflow and a Zap both writing the same field, flipping it back and forth. Loops, where workflow A updates a record, which triggers workflow B, which updates the record, which triggers workflow A. Zaps with no owner that nobody remembers building and everybody is afraid to turn off. Each one is a writer you cannot account for, and you cannot debug a system with writers you cannot account for.

So we delete. Deleting is cheaper than debugging, and a smaller system fails in fewer ways. We turn things off, watch what actually breaks, and rebuild only what something depended on. It is a strange thing to invoice for. It is also usually the highest-value work in the engagement. Per how we work, the scope is quoted in writing first, so the deletion is on paper before we touch anything.

The three-question test before building anything

Before we build any workflow now, we ask three questions.

Question	If the answer is no
Has this process run the same way, by hand, the last ten times?	Do not automate. Write a checklist and run it manually until the checklist stops changing.
Can you label every step as mechanical or judgment?	You do not understand the process yet, and the automation will not either. Map it first. Automate only the mechanical steps.
Will a named person find out within a day when it breaks?	Build the ownership and the alerting first. An unwatched automation is a liability with a delay on it.

Be honest about what this means. If your process fails question one, you do not need us at $150 an hour. You need a month of doing the work manually with a written checklist. That month is not a delay before the real work. It is the real work, and it costs nothing.

When the test passes, automation pays back hard. One recruitment funnel we built on HubSpot, Zapier, and Eventbrite moved 4,000+ applicants and eliminated 85% of the manual recruitment work. Same tools everyone else has. The difference was a stable process underneath, locks where runs could collide, and humans left on the judgment steps. If you want a second opinion on whether your process is ready, ask us. We will tell you if the honest answer is not yet.

FAQ

How do I know my process is stable enough to automate?

Run it by hand from a written checklist. When the checklist survives the last ten runs without an edit, the mechanical steps are ready. If you are still changing steps as you go, automation will only repeat the wrong shape faster. Stabilize first, then build.

Which steps should never be automated?

Steps where a wrong answer costs a relationship instead of a retry: exceptions, contested calls between people, money decisions outside written policy, apologies, and the first occurrence of anything new. AI can parse and classify on those steps, but a human should decide.

Why would a consultant delete automations we already paid for?

Because overlapping automations fight. Two writers on one field, trigger loops, and unowned zaps are each a failure source you cannot account for. Deleting shrinks the failure surface and is usually cheaper than debugging. We quote the scope in writing first, turn things off, watch what actually breaks, and rebuild only what something depended on.

What does automation maintenance actually cost?

There is no universal number, but the cost is real. Renamed fields, edited pipeline stages, and API changes break workflows silently, with no error shown. Budget a named owner, error alerting a human reads, and a registry of what exists and what it touches. If nobody would notice a silent failure, do not build that workflow yet.

Want this handled instead of read about?

We scope this exact work in hours, quote it in writing, and ship it in weeks. The 30-minute call is free and useful either way.

Book a Discovery Call

$150/hr flat · published pricing · no retainer pitch