Pollution Guards Over Governance

If a repo is meant to stay generic, defend it with a pre-commit hook, not a willpower-dependent rule. Mechanical guards beat governance — they don't depend on the author remembering what's allowed at commit time.

Status: active · two repos protected Domain: repo hygiene · automation

The problem

Some repos are meant to stay generic — pattern libraries, R&D vaults, anything you might publish later. Others are full of business specifics — customer data, internal product names, sensitive workflow logic. The two should never mix. Once business names land in a generic repo, the option to publish it is gone, and the cost of cleanup grows with every commit.

The natural defense is a written rule: "don't put business specifics in this repo." Three sessions later, you're heads-down on a problem, you commit a fix that includes one customer name in a comment, and the rule didn't fire. By the time you notice, it's in git history.

The decision

Replace the written rule with a mechanical guard. A pre-commit hook in .git/hooks/pre-commit that does this:

#!/bin/bash
PATTERN='AcmeCorp|InternalProductName|CustomerNameA|customer.example.com'
HITS=$(git diff --cached --name-only -z \
       | xargs -0 grep -lE "$PATTERN" 2>/dev/null \
       | grep -v '^decisions/' \
       | grep -v 'README.md')
if [ -n "$HITS" ]; then
  echo "Pollution guard: business names in staged files:"
  echo "$HITS"
  exit 1
fi
exit 0

The hook runs at git commit time. If any staged file contains a forbidden name, the commit fails with a clear message naming the offending files. The author either removes the reference, moves the file to an allowed exception path (decisions/, README.md, etc., where pointers are legitimate), or commits with --no-verify after a deliberate decision.

Why this beats governance

It runs at the moment of risk. Not at code review (too late, the commit's already in). Not at PR time (too far away from the act). At git commit, when the author still has full context.
It doesn't depend on willpower. The author doesn't have to remember the rule. The hook does.
It's reversible. A single git commit installs or uninstalls the hook. No tooling adoption, no team training.
It's auditable. The pattern is a single regex line. Anyone can read it, extend it, or argue with it.
It fails loud. A blocked commit forces a real decision: clean it up, route it elsewhere, or override on purpose. Silent governance produces silent drift.

What this rejects

Linting via convention. "We have a rule" doesn't catch anything that's actually committed. Rules without enforcement are decorative.
Code review as the catch. Reviewers miss things. They especially miss things in repos that aren't usually reviewed (personal vaults, experiment dirs).
Ad-hoc cleanup sweeps. "We'll redact this later" never happens. Every week of accumulated pollution is a week of cleanup deferred indefinitely.

What I'd revisit

The pattern is a regex, which can produce false positives. A name that's also a common English word (rare, but possible) would block legitimate commits. The override is --no-verify, which is the right escape hatch — explicit, single-commit, doesn't disable the guard for future commits.

The hook lives at .git/hooks/pre-commit, which is per-clone, not committed. New clones have to install it. A reinstall recipe lives at the bottom of the README so cloning is one copy-paste away from being protected. If this becomes a team practice rather than a personal one, a tracked scripts/install-hooks.sh would close the loop.