SEO incidents
How a Stray "Disallow: /" Deindexed an Entire Site
Your site is up. It loads fine. But traffic just fell off a cliff — because one line in robots.txt, or a leftover noindex tag, is telling Google to stay away. This is one of the quietest "up but broken" failures there is: nothing errors, nothing crashes, and you usually find out weeks later when rankings are already gone. Here's exactly how it happens, how to confirm it, and how to make sure it never goes unnoticed again.
- How "Disallow: /" and noindex silently deindex sites
- How to confirm whether you're actually blocked
- How to catch robots.txt changes the day they ship
The failure
How one line takes a whole site off Google
Most teams imagine deindexing as something dramatic — a hack, an outage, a manual penalty. In reality, the most common cause is far more boring: a robots.txt file that, somewhere along the way, picked up this:
User-agent: *Disallow: /
That tells every crawler to stay out of the entire site. The pages still load. Visitors who already have the URL see everything normally. But Google stops crawling, stops refreshing its index, and — over the following days and weeks — quietly drops your pages from search results.
The sibling failure is a leftover noindex. A staging environment ships <meta name="robots" content="noindex"> on every page, or an X-Robots-Tag: noindex header gets copied to production. Same outcome, even sneakier — robots.txt looks clean, but the pages themselves are begging to be removed.
Why it's an "up but broken" problem: nothing returns an error. Uptime monitors see 200 OK and report all green. The site is technically perfect. It's just invisible to the one audience that drives your organic traffic. This is exactly the gap website monitoring exists to close.
20
Detection rules
5–30 min
Check intervals
Free
1 site
The usual suspects
The most common ways this gets shipped
Staging config promoted to prod
CriticalStaging environments block crawlers on purpose with "Disallow: /". A deploy copies the staging robots.txt — or the staging env flag — straight to production. Now the live site is blocked.
CMS "discourage search engines" toggle
CriticalWordPress and many CMSes have a single checkbox that adds a site-wide noindex. A developer ticks it during a rebuild and forgets to untick it before launch.
A bad migration or template change
CriticalA theme update, framework upgrade, or new build pipeline regenerates robots.txt from a default template — and the default happens to disallow everything.
X-Robots-Tag header at the server/CDN
ModerateA noindex header set at the nginx, Apache, or CDN layer applies to every response. robots.txt looks fine, the HTML looks fine, but the header is removing you from the index.
A "quick fix" that overreaches
ModerateSomeone wants to block one folder, writes "Disallow: /" by mistake instead of "Disallow: /admin/", and ships it. One missing path segment, whole site gone.
A temporary block that was never removed
LowYou block crawlers during a big launch or migration "just for a day" — and the line stays in robots.txt for three months because nothing reminds you it's there.
Diagnosis
How to confirm you're actually blocked
1. Read your live robots.txt
Open https://yoursite.com/robots.txt directly in a browser. Look for any Disallow: / under User-agent: *. A blank Disallow (Disallow:) means "block nothing" — that's fine. A single slash means "block everything" — that's the problem.
2. Check for a noindex meta tag
View source on a few key pages and search for noindex. Then check response headers with curl -I https://yoursite.com/ and look for an X-Robots-Tag: noindex line. The header overrides nothing-looks-wrong HTML.
3. Use Google Search Console
Run the URL Inspection tool on an important page. If it says "Blocked by robots.txt" or "Excluded by 'noindex' tag," you have your answer. Check the Pages (Indexing) report for a spike in excluded URLs — that's the smoking gun.
4. Check how long it's been live
This is the part that hurts. Most teams discover the block weeks after it shipped, because the only signal was a slow traffic decline. The fix takes five minutes; the lost rankings take months to recover. That gap between "it broke" and "we noticed" is the entire reason to monitor robots.txt continuously rather than auditing it occasionally — see robots.txt monitoring.
Recovery
How to fix it and recover indexing
Remove the block
Delete the "Disallow: /" line (or change it to the specific path you meant to block). Remove any site-wide noindex meta tag and the X-Robots-Tag: noindex header. Untick the CMS "discourage search engines" box.
Redeploy and verify the live file
Confirm the production robots.txt and page headers are clean — not just your local repo. CDNs cache robots.txt, so purge the cache and re-fetch the live URL to be sure.
Request re-crawling
In Search Console, use URL Inspection → Request Indexing on key pages, and resubmit your sitemap. This nudges Google to recrawl faster than waiting for the natural cycle.
Add a tripwire so it can't happen silently again
Set up continuous robots.txt and noindex monitoring. The next time a deploy reintroduces a block, you find out in minutes — not when quarterly traffic is already down.
Never again
How to make sure it never goes unnoticed
robots.txt change detection
Sitewatch fetches your robots.txt on every check and diffs it. The moment a "Disallow: /" appears — or your whole file changes — you get an alert with the exact diff.
Noindex detection
Monitoring also watches for noindex meta tags and X-Robots-Tag headers on your real pages, catching the version of this failure that robots.txt audits miss entirely.
Post-deploy checks
Most deindexing happens right after a deploy. A check triggered after each ship confirms your crawl directives survived the release.
Sitemap monitoring alongside it
A blocked site and a broken sitemap often ship together in the same bad migration. Pair robots.txt monitoring with sitemap monitoring to cover the whole crawl path.
Common questions
Not instantly, but quickly. Google stops crawling blocked URLs right away and stops refreshing them in the index. Over the following days to weeks, pages drop out of results. The danger is the delay — by the time traffic visibly falls, the block has often been live for a while.
Disallow in robots.txt tells crawlers not to fetch a URL. Noindex (a meta tag or X-Robots-Tag header) tells Google not to keep a fetched page in the index. Confusingly, if you block a page with robots.txt, Google may not even see your noindex. For full removal, noindex without blocking the URL is usually the correct approach — but for an accidental block, you simply remove both.
Because nothing was down. The server returned 200 OK, pages rendered, assets loaded. Uptime monitoring only answers "is it responding?" — not "is Google allowed to index it?" Detecting a crawl-blocking change requires fetching and diffing robots.txt and inspecting page directives, which is what website monitoring does.
The fix is minutes. Recovery is not. Once you remove the block and request re-crawling, Google has to recrawl and re-evaluate every page, then rebuild rankings. Light cases recover in days; a site that was blocked for weeks can take a month or more to fully bounce back. This is why catching it early matters so much.
Keep reading
Related resources
Robots.txt Monitoring
Get alerted the moment your robots.txt changes.
Sitemap Monitoring
Catch broken and missing sitemap URLs.
Website Monitoring
How Sitewatch catches "up but broken" failures.
Website Broken After Update
When a deploy quietly breaks production.
Is My Website Down?
Check any URL right now.
Know the moment your site gets blocked
Free plan available. Continuous robots.txt and noindex monitoring — so a bad deploy can't quietly deindex you.