Robots.txt Monitoring
The most expensive one-line deploy mistake, caught the day it ships
A staging config ships to production and now robots.txt says Disallow: /. Google is about to deindex your client's entire site — and you won't find out until organic traffic collapses three weeks later. Sitewatch catches the change the day it happens.
- Flags Disallow: / as a high-severity regression
- Alerts when robots.txt goes from reachable to a server error
- Caught on the next check — not weeks later in your rankings
Robots.txt regression detected
Detected in last check
Affected file
Recent activity
- robots.txt — Disallow: / added (site-wide block)just now
- Previous: Allow: / (crawlable)6h ago
- Search engines will stop crawling all pagesjust now
- sitemap.xml — still reachable1m ago
A symptom that lags the cause by weeks
An accidental Disallow: / is invisible until your rankings are gone
Disallow: / detection
The single most damaging robots.txt change — telling every search engine not to crawl any page — is flagged as high severity the moment it appears.
Reachability monitoring
If robots.txt goes from a healthy response to a 5xx or times out, Sitewatch flags it (medium severity). A missing or erroring robots.txt confuses crawlers.
Regression-based, not noisy
Sitewatch compares against the previously-seen robots.txt. It alerts on a change for the worse — not on every harmless edit — so you only hear about real risk.
The classic deploy accident
Staging environments routinely block crawlers. When that config ships to production, the block ships with it — easy to do in a hurry, and slow to notice because the site still loads fine.
Protect organic traffic
Lost rankings from an accidental block can take weeks to recover even after the fix. Catching it on day one is the difference between a non-event and a quarter of lost traffic.
In the free scan, too
The robots-disallow-all check runs in the free, no-login public scan as well — so you can spot the problem before you even sign up.
High
Severity for Disallow: /
Day 1
Caught when it deploys
Free + paid
Available on every plan
Robots.txt failure modes
How robots.txt quietly destroys your search visibility
Crawl-blocking regressions
- Disallow: / added — a site-wide block on all crawlers
- A staging robots.txt shipped to production
- A user-agent block that catches Googlebot or Bingbot
- Wildcards that disallow far more than intended
Availability problems
- robots.txt returning a 5xx server error
- robots.txt newly returning 404 or redirecting away
- The file timing out or becoming unreachable
Never get silently deindexed again
Add your site and Sitewatch watches robots.txt on every check. Free plan, no credit card.
FAQ
Frequently asked questions
It compares the current robots.txt against the previously-seen version. A new site-wide Disallow: / is flagged high severity; the file becoming unreachable or returning a server error is flagged medium. Harmless edits — adding a sitemap line, allowing a new path — do not alert.
On the next check after the change ships. Combined with deploy hooks, you can have a check run within seconds of a deploy — so an accidental block is caught immediately, not weeks later when rankings drop.
Yes. The robots-disallow-all check is part of the free, no-login scan as a stateless detection, and part of the paid monitor as a continuous regression check. Run a scan to see your current robots.txt status.
Because the symptom lags the cause. A Disallow: / does not break the site for visitors — pages still load. But search engines stop crawling, rankings decay over the following weeks, and recovery can take just as long after the fix. The only cheap moment to catch it is the day it ships.
Explore more