Skip to content

SEO incidents

How a Stray "Disallow: /" Deindexed an Entire Site

Your site is up. It loads fine. But traffic just fell off a cliff — because one line in robots.txt, or a leftover noindex tag, is telling Google to stay away. This is one of the quietest "up but broken" failures there is: nothing errors, nothing crashes, and you usually find out weeks later when rankings are already gone. Here's exactly how it happens, how to confirm it, and how to make sure it never goes unnoticed again.

  • How "Disallow: /" and noindex silently deindex sites
  • How to confirm whether you're actually blocked
  • How to catch robots.txt changes the day they ship

The failure

How one line takes a whole site off Google

Most teams imagine deindexing as something dramatic — a hack, an outage, a manual penalty. In reality, the most common cause is far more boring: a robots.txt file that, somewhere along the way, picked up this:

User-agent: *
Disallow: /

That tells every crawler to stay out of the entire site. The pages still load. Visitors who already have the URL see everything normally. But Google stops crawling, stops refreshing its index, and — over the following days and weeks — quietly drops your pages from search results.

The sibling failure is a leftover noindex. A staging environment ships <meta name="robots" content="noindex"> on every page, or an X-Robots-Tag: noindex header gets copied to production. Same outcome, even sneakier — robots.txt looks clean, but the pages themselves are begging to be removed.

Why it's an "up but broken" problem: nothing returns an error. Uptime monitors see 200 OK and report all green. The site is technically perfect. It's just invisible to the one audience that drives your organic traffic. This is exactly the gap website monitoring exists to close.

20

Detection rules

5–30 min

Check intervals

Free

1 site

The usual suspects

The most common ways this gets shipped

Staging config promoted to prod

Critical

Staging environments block crawlers on purpose with "Disallow: /". A deploy copies the staging robots.txt — or the staging env flag — straight to production. Now the live site is blocked.

CMS "discourage search engines" toggle

Critical

WordPress and many CMSes have a single checkbox that adds a site-wide noindex. A developer ticks it during a rebuild and forgets to untick it before launch.

A bad migration or template change

Critical

A theme update, framework upgrade, or new build pipeline regenerates robots.txt from a default template — and the default happens to disallow everything.

X-Robots-Tag header at the server/CDN

Moderate

A noindex header set at the nginx, Apache, or CDN layer applies to every response. robots.txt looks fine, the HTML looks fine, but the header is removing you from the index.

A "quick fix" that overreaches

Moderate

Someone wants to block one folder, writes "Disallow: /" by mistake instead of "Disallow: /admin/", and ships it. One missing path segment, whole site gone.

A temporary block that was never removed

Low

You block crawlers during a big launch or migration "just for a day" — and the line stays in robots.txt for three months because nothing reminds you it's there.

Diagnosis

How to confirm you're actually blocked

1. Read your live robots.txt

Open https://yoursite.com/robots.txt directly in a browser. Look for any Disallow: / under User-agent: *. A blank Disallow (Disallow:) means "block nothing" — that's fine. A single slash means "block everything" — that's the problem.

2. Check for a noindex meta tag

View source on a few key pages and search for noindex. Then check response headers with curl -I https://yoursite.com/ and look for an X-Robots-Tag: noindex line. The header overrides nothing-looks-wrong HTML.

3. Use Google Search Console

Run the URL Inspection tool on an important page. If it says "Blocked by robots.txt" or "Excluded by 'noindex' tag," you have your answer. Check the Pages (Indexing) report for a spike in excluded URLs — that's the smoking gun.

4. Check how long it's been live

This is the part that hurts. Most teams discover the block weeks after it shipped, because the only signal was a slow traffic decline. The fix takes five minutes; the lost rankings take months to recover. That gap between "it broke" and "we noticed" is the entire reason to monitor robots.txt continuously rather than auditing it occasionally — see robots.txt monitoring.

Start monitoring today

Free plan. No credit card.

Recovery

How to fix it and recover indexing

1

Remove the block

Delete the "Disallow: /" line (or change it to the specific path you meant to block). Remove any site-wide noindex meta tag and the X-Robots-Tag: noindex header. Untick the CMS "discourage search engines" box.

2

Redeploy and verify the live file

Confirm the production robots.txt and page headers are clean — not just your local repo. CDNs cache robots.txt, so purge the cache and re-fetch the live URL to be sure.

3

Request re-crawling

In Search Console, use URL Inspection → Request Indexing on key pages, and resubmit your sitemap. This nudges Google to recrawl faster than waiting for the natural cycle.

4

Add a tripwire so it can't happen silently again

Set up continuous robots.txt and noindex monitoring. The next time a deploy reintroduces a block, you find out in minutes — not when quarterly traffic is already down.

Never again

How to make sure it never goes unnoticed

robots.txt change detection

Sitewatch fetches your robots.txt on every check and diffs it. The moment a "Disallow: /" appears — or your whole file changes — you get an alert with the exact diff.

Noindex detection

Monitoring also watches for noindex meta tags and X-Robots-Tag headers on your real pages, catching the version of this failure that robots.txt audits miss entirely.

Post-deploy checks

Most deindexing happens right after a deploy. A check triggered after each ship confirms your crawl directives survived the release.

Sitemap monitoring alongside it

A blocked site and a broken sitemap often ship together in the same bad migration. Pair robots.txt monitoring with sitemap monitoring to cover the whole crawl path.

Common questions

Know the moment your site gets blocked

Free plan available. Continuous robots.txt and noindex monitoring — so a bad deploy can't quietly deindex you.