← All workflows

scalably.io the work

How the fulfillment watchdog works

When you send products out to creators, some packages quietly never ship, and some get stuck in transit for weeks. This checks every one, every morning, and tells you which ones need a human, before a creator is left wondering where their package went.

A look under the hood: what it checks, what counts as a problem, and why this is one job we deliberately don't hand to an AI.


The short version

You keep a list of every product going out to a creator. Each morning this reads that list, checks where every package actually is with the carrier, and sorts them: shipped, delivered, still waiting, or in trouble. The ones in trouble, never shipped, stuck in transit for weeks, or flagged by the carrier as lost or returned, get pulled into a short digest sent straight to whoever runs fulfillment. The only thing it changes on its own is ticking a package off as delivered once the carrier confirms it. Everything else, it surfaces for a person to handle.

That's the whole job. It runs across all your brands at once, and the rest of this page is exactly what it catches, and why it's built the way it is.

Every shipment checked, every morning Your shipmentsone row per package Check each onewhere is it now Flag the problemsbefore they fester Morning digestone clear summary scalably.io

The whole loop, once a day. The green step is the point of the tool: not the list, not the digest, but the catch, the moment a problem gets surfaced while there's still time to fix it.

The three ways a package goes wrong

There are really only three failure modes, and the watchdog is built to catch each one. None of them announce themselves, which is exactly why a tireless daily check beats a person remembering to look.

It never shipped

A package is logged to go out, and then it just sits. Someone meant to send it, got busy, and it slipped. Days pass and the creator hears nothing. The watchdog sees a row that's still marked as waiting and surfaces it, so "we'll send it tomorrow" doesn't quietly become "we never sent it."

It's stuck in transit

The package shipped, the tracking number is real, and then the carrier just stops updating. Most of the time that means it's genuinely stuck, or lost. The watchdog watches the clock: when a package has been in transit too long without arriving, more than a month, it gets flagged as possibly stuck, with how many days it's been and the last thing the carrier reported.

The carrier flagged it

Sometimes the carrier tells you directly that something's wrong, a delivery exception, a return to sender, an expired label. Those signals are easy to miss in a tracking dashboard you don't check daily. The watchdog reads them on every package and pulls the bad ones to the top.

The three ways a package goes wrong Still unshippedsitting, not sent Stuck in transitno movement, 30+ days Carrier exceptionlost, returned, failed Surfaced to youflagged, never hidden scalably.io

Three quiet failures, one loud outcome: they all end up in front of a human. The tool's only job is to make sure none of them stay invisible.

One pass, every brand

If you run several brands, each has its own list and its own quirks, and the watchdog handles them all from the same engine. You don't run it once per brand by hand, it sweeps them all and sends each its own digest.

Each brand is just a set of details the tool already knows: which list to read, which columns that list uses, who should get the digest. Some brands lay their list out a little differently, and that's fine, the engine adapts to each rather than forcing them all to match. Adding a brand is adding its details to the registry, not writing new code. The result is one consistent watchdog covering your whole operation, with a separate, tailored morning summary for each brand's team.

Why this one isn't an AI

Most of what we build is an AI agent. This isn't, on purpose. Checking whether a package shipped is a job with exactly one right answer, and the right tool for that is plain, deterministic code, not a language model. It runs for essentially nothing, it never invents a problem that isn't there, and it gives the same answer every time.

This is worth being clear about, because it's a real principle, not a limitation. An AI is the right tool when a job needs judgment, reading a reply, writing a draft, deciding what fits. Watching a status column and comparing a date against a threshold needs no judgment at all, and putting a language model in that loop would only add cost, latency, and the small but real chance of a confident mistake. So this part of the operation is built as a boring, reliable watchdog: no tokens, no guessing, no drift. It does the same exact check on every package, forever. Knowing when not to reach for AI is part of building something you can actually depend on.

An AI is the right tool for judgment. This job has no judgment in it, just a check that has to be right every single time. So it's built to be exactly that, and nothing more.

What it changes, and what it leaves to you

It changes exactly one thing on its own: once the carrier confirms a package arrived, it marks that package delivered, so your list stays current without anyone updating it by hand. Every other problem it finds, it hands to a person. It flags, it never fixes.

That line is deliberate. Ticking off a confirmed delivery is safe and unambiguous, so it does that automatically. But deciding what to do about a package that never shipped, or chasing a carrier about one that's stuck, is a judgment call with a creator on the other end, and that stays with your team. The watchdog also never contacts the creators themselves; everything it says goes to your internal people. Its whole purpose is to make sure the problems reach a human early, while they're still easy to fix, and then get out of the way.

The boring part is the point A watchdog is only worth having if you can forget it's there. This one runs every morning whether or not anyone remembers to look, costs almost nothing to run, and surfaces the handful of packages that need attention out of the hundreds that are fine. The quiet days, when it finds nothing, are it working exactly as intended.
How the fulfillment watchdog works scalably.io