Robots.txt File Set-Up Process

Table of Contents

Add a header to begin generating the table of contents

Want to ensure you have more control over which pages have been indexed by Google?

Then you will benefit from going through the process of setting up a Robots.txt file to make sure you know what pages you’ve allowed crawlers to go through and which have been disallowed.

Simply follow the step-by-step instructions in this blog to get yourself started and ensure that you are on the right track to effectively organising your site’s SEO.

Top Tips

Whenever you see things like “(insertyourdomainhere)” it means to put your website’s actual details there, this is just placeholder for you to fill in the gaps.

Benefits

By completing this process, you can make sure you have made it clear to the crawlers what pages you would and wouldn’t like them to go through.
You can ensure that you have more control over how you want the website to be indexed.

Current Robots.txt Audit

Use your browser to open up the current version of the robot.txt.
- Unsure how to do this? No problem! Simply input: “http://(insertyourdomainhere).com/robots.txt“
- If one doesn’t appear, then you might not actually have one set up – if that’s the case, please move down to the Robots.txt Creation section.
Here we have listed some issues that you will need to go through and fix if they are applicable to your business:
- The robots.txt from GSC has validated the relevant robots.txt
- Robots.txt file should not be used alone to remove pages that appear in the SERPs
- The robots.txt file is disallowing pages which are not considered to be of importance – this means they won’t be crawled.
- Sensitive data is going to be disavowed – so crawlers do not go through it
- It is important to make sure that you have not disallowed any scripts which are vital to helping render the pages.
To ensure that you have not left any pages out, you’re going to need to go through this testing process:
- Ensure that you have Google open (ensuring that it is the right domain for the area of you are targeting – i.e. .co.uk for the UK).
- Input this into the search bar: “site:(insertyourdomainhere).com”
- Now you will be shown all of the SERPs for the domain which Google has gone through and indexed. Make sure to go through them and find those which don’t meet the standards of the checklist.

Robots.txt Creation

On your device, start off by setting up a .txt document
These are 2 templates which you are able to copy in the event that either of these things are applicable to you:
- Crawlers have been disallowed from crawling the whole website
  - This will mean that Google does not have the ability to crawl your whole website – it is important to keep in mind that this could have a negative impact on your search engine rankings.
    - User-agent: *
    - Disallow: /
    - Sitemap: http://(insertyourdomainhere.com)/(insertthenameofyoursitemap).xml
- Enabling crawlers to go through the whole website
  - The reason that it is okay to have this robots.txt is that it will not alter how search engines go through and crawl the website – by default, they will be going through all that is available unless you have enforced “Disallow”
    - User-agent: *
    - Disallow: /
    - Sitemap: http://(insertyourdomain here).com/(insertthenameofyoursitemap).xml

For the majority of the time, the above 2 cases will not be relevant to your business — traditionally, you will benefit from enabling all robots to go through specific sections of your website. If that is applicable to you then:
- You can block a particular path and sub-paths
  - You will need to begin by adding this:
    - User-agent: *
  - Path/Subpath/Filetype Blocking – highlight the specific areas that you do not want the crawlers to go through by putting in an additional line:
    - Disallow: /your-path (Specific to Paths)
    - Disallow: /*filetype$ (Specific to Filetypes)

- To prevent particular crawlers from going through the website.

1. - At the end section of the robots.txt file you’ll need to implement this line:
    - User-agent: (insertname) Crawler
    - Disallow: /

Robots.txt Addition To Websites

Throughout this section of the blog, we will be going through the procedure of how you are able to add a robots.txt file to your site:

In the event that you are a WordPress user when it comes to your website and currently has this plugin in action: “Yoast SEO”
- Start on the admin panel for WordPress
- Make sure to select “SEO” and then “Tools”
  - In the event that this option is not visible to you, you will need to select “Advanced features”
    - Select these in the following order:
      - “SEO”
      - “Dashboard”
      - “Features”
      - “Save Changes”
Now make sure to select “File Editor”
A text box should have popped up, allowing you to input information relating to the robots.txt file – you’re going to need to input the robots.txt file you’ve just created in here and make sure to select “Save Changes to Robots.txt.”
To ensure that everything has worked as it should make sure to check this by inputting: “http://(insertyoursite’sname).com/robots.txt”
The robots.txt file should show up on your site now
In the event that you are a user of another platform, then you will need to ensure the robots.txt file has been uploaded either using a File Transfer Protocol or Secure File Transfer Protocol. This is not something we will be going through in this blog, however in the event you do not have access to upload the files to the server, you will need to request the organisation’s developer for the website to do so.

Robots.txt Validation Process

Begin on the Google Search Console tester tool to get started with the Validation process.
Make sure that you have clicked on the relevant Google Account – this will then send across the testing tool for the robots.txt.
- You will then be able to see the latest robots.txt file which Google had crawled.
Now you’re going to need to open up the robots.txt file and ensure that the version which can be seen in the tool is identical to the one you have live on the domain.
- If you want to view your robots.txt you’re going to need to input: “http://(insertyourdomainhere).com/robots.txt”
- If you are unable to see this, then this means you’re looking at an outdated Google Search Console page and so you’ll need to follow these extra few steps:
  - Hit “Submit” – you see it to the bottom right of your screen and then click it again when it refers “Ask Google To Update”
  - Hit Ctrl+F5 to refresh your PC
You will now be shown if there are any warnings or syntax errors on the specific file.
- In the event that you do notice some errors have popped up, you’ll need to make sure you go through the steps outlined on this blog or check that the file is relevant to the guidelines for Google.
If your robots.txt file does not have any problems, you’re going to go through the URLs to test it, this is something you can do by putting them into the text input section and hitting “Test”
- If it appears that the URL has been blocked by the robots.txt file, then you will notice that a “Blocked” notification appears instead. The tool will then proceed to show the section which has resulted in the URL being blocked.

And… you’re done!

You have now successfully gone through the Robots.txt file document following our step-by-step process.

Thanks for reading!

We hope you found this blog useful – if you did, then make sure to check us out on social media so that you can keep updated with our latest news and blog posts.

Posted in Web Development

Robots.txt File Set-Up Process

Top Tips

Benefits

Current Robots.txt Audit

Robots.txt Creation

Robots.txt Addition To Websites

Robots.txt Validation Process

Contact

Resources

Services

© 2021 Orisel - Get More Sales Online. All Rights Reserved. Registered Company No. 09886405. Vat No, 292 1751 95