← All workflows

scalably.io the work

How the internal-linking engine works

It reads your site, finds the pages worth boosting, and proposes the internal links that actually deserve to exist. No padding, no guessing.

A look under the hood: where the data comes from, how the agents decide, and the two gates that keep the output clean.


The short version

Give it your site. It pulls performance data from Google Search Console, weights the opportunity with Ahrefs, picks the pages to boost (targets) and the pages to link from (sources), then runs one focused agent per source to propose links. Every proposed link passes two gates before it reaches you: a deterministic check and a quality check. What survives is a Google Sheet of links you can place.

That's the entire idea. The rest of this page is what each of those steps actually does, and where you get to turn the dials.

From your site to a link plan Your dataSearch Console + Ahrefs Selectiontargets + sources Matcher agentsone per source Two gatesvalid, then a good fit Link plana sheet you place scalably.io

The green path is the agent doing the judgment. Everything around it is data and deterministic checks. Quality is the product, so two gates sit between the agents and you.

Where the data comes in

Search Console finds the pages with unrealised demand. Ahrefs tells you which of those are worth the push. Neither one alone is enough, which is why both feed the selection step.

Search Console is the honest signal, because it's your real performance, not an estimate. The engine reads it for the pages stuck on page two (one nudge from page one), the pages with high impressions but almost no clicks (demand with no payoff), and the queries each page actually ranks for. That last part matters: it gives every page a real keyword profile straight from Google, not a guess.

Ahrefs answers the next question. Search Console says "this page is stuck on page two for 6,000 impressions." Ahrefs says whether that keyword is worth fighting for, by adding true search volume and keyword difficulty. A page-two ranker for a high-volume, low-difficulty term is a far better target than one for a term nobody searches. The two together are the "GSC plus Ahrefs" selection: Google finds the candidates, Ahrefs ranks them by real opportunity.

From that combined picture, the engine picks up to 15 targets (the pages to boost) and up to 30 sources (the high-authority pages to link from). It never pads to hit those numbers. Fewer good pages beats more weak ones.

One agent per source, working in parallel

Each source page gets its own dedicated agent. That agent reads the full page, holds your whole target list in mind, and proposes up to three links, after checking every target one by one. They run many at a time, so 30 sources don't take 30 turns.

The reason for one-agent-per-source is quality, not speed. If you ask a single agent to handle ten pages at once, it satisfices: it finds a decent link and moves on. Give each source its own agent and it does a full sweep against every target before it commits, so it surfaces the genuinely best match instead of the first acceptable one. The parallelism is what keeps that affordable.

One agent per source, in parallel Orchestratordispatches the batch Matchersource 1 Matchersource 2 Matchersource 3 Matchersource N scalably.io

The orchestrator fans the work out, runs many matchers at once, then collects the proposals. One source per agent is the lever that keeps matches deep instead of lazy.

The two gates: why the count is what it is

Every proposed link passes two checks before you ever see it. The first is deterministic and mechanical. The second is a judgment call about whether the link genuinely fits. A link only ships if it clears both.

Gate one: the validity check (rules, no opinion)

This stage is pure rules, run in code, so it's the same every time:

The anchor text must appear verbatim in the page's body, not invented and not pulled from a heading. The phrase must not already be a link. One anchor maps to one target across the whole campaign. No more than three links per source. And two portfolio rules that keep the link profile looking natural rather than manipulated: a single anchor phrase is used at most twice across the campaign, and no single target page hogs more than three incoming links. A link that breaks any of these is dropped with a reason, and the agent's other proposals for that page still get their shot.

Gate two: the quality check (does it actually fit?)

A link can be perfectly valid and still be wrong. This second gate is an agent acting as a judge, and it asks one question: does the anchor's promise match what the target page delivers? If the anchor says "duplicate content" but the target is a page about rich snippets, the reader was promised one thing and handed another. That link is valid by the mechanical rules, and it still gets cut here. This is the gate that makes the output actually accurate, not just mechanically valid.

A real run, honest about the cuts Sourcesprocessed in full Pass validityrules met Links shippedfit the page too some yield none quality gate cuts more scalably.io

Sources go in, get processed in full, and only the links that clear both gates ship. The drops are on purpose, not misses: some sources hold no on-topic phrase strong enough to link, and a few valid links fail the fit test.

The headline number isn't a quota. With 30 sources and three links each, the ceiling is 90. But "up to 90" never means "manufacture 90." The real count is whatever survives both gates, and a forced link is worse than no link.

So when a run returns far fewer than the ceiling, that's the gates doing their job. Some sources simply had no on-topic phrase strong enough to link, and a few valid links failed the fit test. That's the same strictness that makes the surviving links accurate. Volume and precision pull against each other, which is exactly why the next section exists.

The dials: tuning strict against loose

The strictness is deliberate and it's adjustable. Each dial trades volume for precision in a predictable direction. You don't rewrite anything. You tell us where you want to sit, ideally against a real example, and we set them.

DialWhat it controlsLoosen it for
Links per target How many incoming links one target page may collect. Default is 3, so no page dominates. More volume on your priority pages
Anchor reuse How often the same anchor phrase may repeat across the campaign. Default is 2, to keep anchors varied. More links when one phrase fits many spots
Quality threshold How tight the anchor-to-target fit must be. Default is strict, so only clear matches pass. Letting "good" matches through, not only textbook ones
Target / source count How many pages enter the run at all. Default is up to 15 targets and 30 sources. A wider net across more of the site
The fastest way to tune it Point us at two or three links you'd have wanted that didn't make it. We'll show you exactly which dial cut each one, then set the dials with you. That beats guessing at settings in the abstract, every time.

What you get, and who places the links

The output is a Google Sheet: one row per source, the proposed links and their exact anchor text, plus a per-target tally showing why each page was worth boosting. A person places the links. That final human check is the point, not a limitation.

The engine proposes; it never edits your live site. Every row is a link a person can read, sanity-check, and place, or skip. That's the right division of labour: the machine does the exhaustive matching and the mechanical validation, and a human keeps the last word. It's the same reason both gates exist. The product is trust, and trust is what survives the checks.

How the internal-linking engine works scalably.io