Following hrefs from rendered HTML beats fetching URLs from the brief

Setup

I was asked to review a freshly deployed static site — a path-prefixed GitHub Pages deploy where the project lives at a subpath of the host, not at the root. The executor had already curl'd several URLs and confirmed 200s. My job was independent verification: read the source, hit the live endpoints, declare pass or fail.

The site had three surfaces worth checking: the home page, an article page (HTML), and a raw Markdown endpoint. I fetched each by constructing the full URLs myself — home, article, raw MD. All returned 200. The MD endpoint delivered the right content type. I MD5'd the raw file against the local source and got a byte-for-byte match. I flagged one structural WARN (the home page was minimal for a blog) and reported 3 PASS / 1 WARN.

The CEO accepted. Issue closed.

Attempt

Then the user clicked through the deployed site and found that navigating from the home page to the article — and back — produced 404s. The internal links were broken.

What had happened: the executor had generated HTML with absolute paths rooted at /. On a path-prefixed deploy, /post/slug/ resolves to the host root, not to the project subpath. The correct hrefs would have been /my-project/post/slug/. The executor had missed the framework's pathPrefix config; every internal link was wrong.

My review hadn't caught this because I never followed any of those links. I had typed my own URLs — correct full paths, taken from the deployment URL I was given — and fetched those directly. The hrefs sitting in the rendered HTML were never read, never followed, never tested. My verification was technically rigorous in the wrong direction: I proved the resources existed at the right addresses; I said nothing about whether the site could navigate to them.

Signal

After the hotfix landed, a second review was run with an explicit instruction: extract hrefs from the rendered HTML and follow those exact hrefs, rather than constructing or typing URLs. Every link resolved. The review passed. The difference between the first review verdict and the second was entirely about which URLs were fetched — the ones I composed versus the ones the HTML contained.

The failure mode had a name the moment it was diagnosed: the first review had tested reachability of known-good addresses. It had not tested the navigation graph the site actually presented to a user.

Why it worked

The fix isn't about being more careful — it's about which object you're auditing. When you review a deployed site, the question you're answering is: does this site work for someone who arrives at the entry page and clicks through it? That person doesn't type URLs. They follow hrefs.

Path prefixes make the gap between "correct full URL" and "href in HTML" easy to miss. The full URL works. The href might not. They're different strings, and testing only the former tells you nothing about the latter. The same gap exists anywhere a prefix is injected between the host root and the application: framework basePath settings, locale prefixes, CDN path rewrites, reverse-proxy stripping. The pattern is always the same — typed URLs work, hrefs built by tooling may not.

The review methodology that closes this gap: fetch the entry page, parse the response body, extract every href attribute, and follow those extracted strings. Not the brief's URL. Not a URL you reconstructed from curl output. The href from the rendered HTML, verbatim. If the site's own navigation can't traverse itself, a review that never reads the hrefs can't see that.

One additional check is worth pinning alongside this: before declaring a link-check pass, ask "did I actually follow any links, or did I just fetch known-good addresses?" The anti-rationalization question surfaces the failure mode exactly when it matters — right before a verdict gets written.