What Is a Robots.txt File (And Why Should You Care)?
Your robots.txt file is a plain text file at the root of your website that tells search engine crawlers which pages to visit and which to leave alone.
Your robots.txt file is a plain text file that sits at the root of your website – usually at yourdomain.com/robots.txt. It tells search engine crawlers like Googlebot which pages they’re allowed to visit and which ones to leave alone.
Think of it as a bouncer at the door of your website. Get the instructions right and the right pages get indexed. Get them wrong and you’re turning away the very visitors you’re paying to attract.
It’s not a complicated file. A basic one might look like this:
User-agent: *(applies to all crawlers)Disallow: /wp-admin/(blocks the backend from being crawled)Allow: /wp-admin/admin-ajax.php(keeps certain functions working)Sitemap: https://yourdomain.com/sitemap.xml
Simple, right? But when someone adds a line they don’t fully understand – or a developer makes a change during a site build – the results can be catastrophic.
The Real Story: How One Line Tanked an Entire Site
A single misplaced line in a robots.txt file can lead to catastrophic loss of organic traffic, as demonstrated by a law firm’s experience.
This isn’t hypothetical. It’s a scenario that plays out regularly, and the pattern is almost always the same.
A law firm in Charlotte hired a web developer to rebuild their site. During the development phase, the developer added Disallow: / to the robots.txt file. This is standard practice – you don’t want Google indexing a half-finished site. The problem? When the site went live, nobody removed it.
That one line –
Disallow: /– tells every search engine crawler to stay away from every single page on the site.
Within weeks, the firm’s pages started dropping out of Google’s index. Within two months, they’d lost over 90% of their organic traffic. Calls from new clients dried up. They had no idea why.
When they eventually brought in an SEO specialist, it took about three minutes to find the problem. Three minutes to find it. Months to recover from it.
The Most Common Robots.txt Mistakes That Kill Traffic
Several common robots.txt errors, from blocking entire sites to essential files, can severely impact a website’s search engine visibility and traffic.
That Charlotte law firm isn’t alone. Here are the mistakes that come up again and again across small business websites.
1. Blocking the Entire Site With Disallow: /
As above – this is the nuclear option. It blocks everything. It’s useful during development and utterly devastating if left in place after launch. Always check your robots.txt immediately after a site goes live.
2. Accidentally Blocking Key Pages or Folders
Imagine you run a dental practice in Denver. Your most valuable pages are your service pages – teeth whitening, dental implants, Invisalign. If your robots.txt contains a line like Disallow: /services/, Google can’t crawl any of them. They disappear from search results. Patients searching “dental implants Denver” won’t find you.
This often happens when someone tries to block one specific folder (like a staging area or internal docs) and accidentally uses a path that matches important content too.
3. Blocking CSS and JavaScript Files
Older SEO advice used to recommend blocking CSS and JS files to save crawl budget. That advice is dangerously out of date. Google now needs to render your pages – including their styling and scripts – to understand them properly.
If you block /wp-content/ or your CSS folder, Google sees a broken, unstyled version of your site. It can’t assess your content properly. It may rank you lower as a result, or not at all for competitive terms.
4. Blocking the Wrong User-Agent
Robots.txt rules apply per user-agent (the specific crawler). If you only allow Googlebot but block everything else, you might be stopping Bingbot, which still drives meaningful traffic for many businesses – particularly in certain industries and demographics.
5. Disallowing Your Sitemap’s Location
Your sitemap tells search engines where all your important pages are. If you’ve accidentally blocked the folder your sitemap lives in, crawlers can’t find it. This doesn’t break everything immediately, but it slows down indexing and can mean new pages take weeks longer to appear in search results.
How to Check Your Robots.txt Right Now
Checking your robots.txt file is a quick process that involves visiting the file directly and utilizing Google Search Console tools.
This takes two minutes. Do it today.
- Go to your browser and type: yourdomain.com/robots.txt
- Read through every line. Look for any
Disallowrules that seem too broad. - If you see
Disallow: /and your site is live, stop everything and fix it immediately. - Log into Google Search Console, go to Settings > Robots.txt, and use the built-in tester to check specific URLs.
Google Search Console’s URL Inspection tool will also tell you if a specific page is being blocked by robots.txt. If you type in a URL and see “Blocked by robots.txt” in the coverage report, that page isn’t being indexed – full stop.
What a Healthy Robots.txt Looks Like
A healthy robots.txt for most small businesses is minimal, protecting backend areas while allowing full crawl access to public content and the sitemap.
For most small business websites – a dental practice, a law firm, a plumber, an accounting firm – a sensible robots.txt is minimal. Here’s a clean example for a WordPress site:
User-agent: *Disallow: /wp-admin/Allow: /wp-admin/admin-ajax.phpDisallow: /wp-login.phpSitemap: https://yourdomain.com/sitemap.xml
That’s it. You’re protecting your backend, you’re allowing Google to crawl everything it needs, and you’re pointing it to your sitemap. Clean and simple.
When Should You Block Pages With Robots.txt?
Robots.txt should be used to block pages with no SEO value or those intended for internal use, but never public-facing content or essential site files.
Robots.txt isn’t always the villain. There are legitimate reasons to block certain areas of your site from crawlers.
Pages Worth Blocking
- Admin and login pages – there’s no SEO value and no reason for Google to index your WordPress login screen
- Thank you pages – after a form submission, you don’t want these indexed (use the
noindexmeta tag as well) - Internal search results pages – if your site has a search function, the results pages are usually thin content with no keyword value
- Staging or duplicate environments – if you have a staging subdomain, block it entirely to avoid duplicate content issues
- Private document folders – anything internal that shouldn’t be publicly visible
Pages You Should Never Block
- Your homepage
- Service pages or product pages
- Location pages (critical for local SEO)
- Blog posts and articles
- Your CSS, JavaScript, and image files
- Your sitemap
The rule of thumb is simple: if a page is something you’d want a customer to find on Google, don’t block it.
Robots.txt vs Noindex: Understanding the Difference
Robots.txt prevents crawling, while the noindex meta tag allows crawling but prevents indexing, making noindex generally more reliable for keeping pages out of search results.
This is where even experienced marketers get confused, so let’s be clear about it.
| Feature | Robots.txt | Noindex Meta Tag |
|---|---|---|
| Purpose | Tells crawlers not to visit a page at all. | Lets Google crawl the page, but tells it not to include it in search results. |
| Crawling | Prevents crawlers from accessing the page content. | Allows crawlers to access and read the page content. |
| Indexing | Can still appear in search results (often with no description) if linked externally, as Google knows it exists. | Prevents the page from appearing in search results. |
| Reliability for Hiding Pages | Less reliable for ensuring a page is completely hidden from search results due to external links. | More reliable for ensuring a page is kept out of Google’s search results. |
| Best Use Case | Managing crawling behavior (e.g., blocking admin areas, staging sites). | Managing indexing behavior (e.g., thank you pages, internal search results). |
For most small businesses, if you want a page kept out of Google, you’re better off using a noindex tag rather than robots.txt blocking. It’s cleaner, more reliable, and avoids the messy edge cases that come with crawl blocking.
Use robots.txt to manage crawling. Use noindex to manage indexing. They’re not the same thing.
The Crawl Budget Argument: Is It Actually Worth Worrying About?
For most small business websites, crawl budget is not a significant concern, and over-optimizing for it can lead to accidental blocking and traffic loss.
You might hear SEO people talk about “crawl budget” – the idea that Google only has a limited number of crawls it’ll dedicate to your site, so you should use robots.txt to direct crawlers away from unimportant pages.
Here’s the honest take: for most small business websites, crawl budget is not your problem.
Crawl budget matters for very large sites – e-commerce stores with tens of thousands of products, news sites publishing hundreds of articles a day. If you’re a family law firm in Austin with 30 pages on your site, Google can crawl the whole thing in minutes. You don’t need to optimize for crawl budget.
Worrying about crawl budget on a small site leads to unnecessary robots.txt rules, which leads to accidental blocking, which leads to traffic loss. Don’t solve a problem you don’t have.
What to Do Next
Immediately check your robots.txt file for critical errors, especially after site relaunches, and use Google Search Console to confirm proper indexing.
If you haven’t looked at your robots.txt file before, do it now. Seriously – open a new tab.
- Check your robots.txt by visiting yourdomain.com/robots.txt and reading every line.
- Look for
Disallow: /– if you see this and your site is live, it needs to go immediately. - Open Google Search Console and run a few of your most important pages through the URL Inspection tool. Confirm they’re indexed and not blocked.
- Check your coverage report in Search Console for any URLs flagged as “Blocked by robots.txt.”
- If you’ve recently relaunched your site or switched developers, make robots.txt the first thing you audit. It’s the most commonly broken thing after a migration.
- Keep it simple. Don’t add rules unless you have a specific reason to. The fewer lines in your robots.txt, the less chance of an accidental catastrophe.
A robots.txt file is tiny – often less than ten lines. But it controls whether search engines can see your entire website. Treat it with the same care you’d give your homepage. One wrong line, left unchecked, can undo months of SEO work and cost you more in lost leads than any other technical error on your site.
FAQ
What is the primary function of a robots.txt file?
The robots.txt file instructs search engine crawlers, like Googlebot, which parts of your website they are allowed to access and which they should avoid, essentially acting as a guide for their crawling behavior.
When should I use a noindex tag instead of robots.txt to hide a page?
You should use a noindex meta tag when you want search engines to crawl a page but not display it in search results. Robots.txt prevents crawling entirely, but a page might still appear in search results if linked externally, making noindex a more reliable method for preventing indexing.
Is crawl budget a concern for small business websites?
For most small business websites with a limited number of pages, crawl budget is generally not a concern. Google can typically crawl small sites quickly, and over-optimizing for crawl budget can lead to unnecessary robots.txt rules and potential accidental blocking of important content.
Want a free SEO article written for your business?
We’ll write 1 optimised article targeting keywords your competitors rank for. No card, no catch.
Get my free article →


