scalably.io ● the work

How domain filtering works

Drop in a long list of websites and your rules for what counts. Get back the shortlist worth keeping, plus a log of everything it cut and exactly why. The junk doesn't just disappear, it leaves a receipt.

A look under the hood: how it decides what stays, the rules that catch the junk, and why nothing is ever cut silently.

The short version

You give it a big list of domains and a few rules, the niches you want, the regions you don't, the kind of sites to avoid. It reads each site, works out what the business actually is, and sorts every domain into one of three piles: keep, remove, or set aside for you to decide. You get back a clean shortlist of survivors, and separately a removal log that names each cut domain and the reason it was dropped. If you ask, it'll also find a contact email for the survivors. Every domain you put in is accounted for in what comes out.

That's the whole job. The rest of this page is how it decides, and why the removal log matters as much as the shortlist.

The green pile is what you came for: the survivors. But the two black piles are the point of the tool. Removed comes with a reason, and review is handed back to you rather than guessed.

It reads the sites, then it cuts

Filtering by a domain name alone is how good sites get thrown out and bad ones get kept. So apart from one obvious shortcut, every decision is made after actually reading the site and working out what the business is.

The one shortcut is the obvious one: if your rule is "no sites from a certain country" and a domain plainly belongs to that country, it's removed without spending time reading it. Beyond that, nothing is judged from the name. For every other domain it fetches the site, reads it, and forms a view of what the business actually does before deciding to keep or cut. The same care that goes into labelling a site goes into filtering it, because a wrong cut is expensive: you lose a good prospect and never know.

What gets a domain cut

A domain is removed for one of a few clear reasons, and the reason is always recorded. It's not a vague "quality score", it's specific: wrong kind of business, wrong place, or the tell-tale signs of a site not worth your time.

The rules fall into a few buckets, and you set which ones apply:

Wrong business

The most common cut. If you're after software companies and a site turns out to be a news outlet that writes about software, it's out. This is the same principle that drives the labelling work: a site is what it is, not what it talks about. A directory, a forum, or a training course about your target niche is usually not a real example of that niche, and gets removed.

Wrong place

If you've ruled out certain regions or languages, sites that clearly belong to them are cut. When a region is genuinely ambiguous or sensitive, it asks you rather than guessing.

Not worth the time

The junk signals: a thin, generic site with no real organisation behind it, the look of a link farm built only to game search engines, plain spam, parked domains, those come out. These are the cuts that make the survivor list actually usable, instead of a slightly shorter version of the mess you started with.

Why the removal log is the real product

Anyone can hand you a shorter list. The thing that makes a filter trustworthy is that you can check its work. So every removed domain comes back with its reason attached, and the borderline cases are handed to you instead of decided for you.

This is the difference between a filter you trust and one you don't. A black-box tool that quietly drops two hundred domains gives you no way to know whether it cut the junk or cut your best prospects. This one writes a removal log: every cut domain, the category it was put in, and a short reason, "news site", "wrong region", "no real business behind it". You can scan it in a minute and spot if a rule was too aggressive. And when a domain genuinely could go either way, an on-topic site that's borderline, it doesn't force a call. It puts it in a review pile and lets you make the judgment.

The receipt. Because every cut is explained, you can audit the filter in a minute and catch a rule that went too far, instead of trusting a number.

A filter you can't check isn't a filter, it's a guess that happens to be shorter. The log is what turns "trust me" into "here's exactly what I did."

Built to be careful, and to be checked

Two habits keep it honest at scale. It never decides a site is irrelevant just because a page failed to load, and before it judges, it has to write down in plain words what the business is. The label follows the evidence, not a hunch.

If a page won't load, it doesn't quietly assume the worst and cut the domain; it tries another way to read it, and only sets it aside if it genuinely can't. And every keep-or-cut decision starts with the site's business stated in a few plain words, with a line of evidence from the page. That "say what it is before you judge it" step is what stops the tool reaching for a lazy label. The whole thing runs across a list in parallel, with a reader that can't get stuck even on hostile sites, so a few hundred domains come back filtered in minutes rather than an afternoon, and the count is checked: keep plus remove plus review always equals what you put in.

The dials: your rules, your call

The filter is yours to define. You set what counts and what doesn't, and the tool builds its understanding around exactly those rules before it starts. When something is genuinely unclear, it asks rather than assumes.

Dial	What it controls	Set it for
Niches to keep	The kinds of business you actually want. Everything that isn't genuinely one of them gets cut.	A tight shortlist on exactly your targets
Regions to exclude	Countries or languages you don't want. Clear-cut cases are removed, ambiguous ones are flagged.	Keeping the list to your markets
Junk strictness	How hard it cuts on quality signals, the thin, spammy, link-farm sites.	A cleaner list, or a more forgiving one
Find emails too	Off by default. When on, it looks up a contact email for the survivors only.	A shortlist that's ready to act on

How it relates to the other two This is the one that removes. Its sibling that only labels every domain by niche, and the one that only finds a contact email per site, are separate tools for separate jobs. This one is for when you have a big messy list and need it cut down to just the keepers, with the reasons shown.

What you get, and where it stops

You get a clean shortlist of the domains worth your time, a removal log that justifies every cut, and a small review pile of the genuinely borderline. Optionally, emails for the survivors. Everything you put in is accounted for in what comes back.

The discipline is in the lane. It filters, and it explains the filtering, and it stops there. It doesn't write outreach, doesn't rank the survivors for you, doesn't decide who to contact. The machine does the patient reading and the consistent, reasoned cutting across a list far too long to do by hand. You keep the rules, the review calls, and the final say over who makes the list.

How domain filtering works scalably.io ●