Robots.txt Generator


Default - All Robots are:  
    
Crawl-Delay:
    
Sitemap: (leave blank if you don't have) 
     
Search Robots: Google
  Google Image
  Google Mobile
  MSN Search
  Yahoo
  Yahoo MM
  Yahoo Blogs
  Ask/Teoma
  GigaBlast
  DMOZ Checker
  Nutch
  Alexa/Wayback
  Baidu
  Naver
  MSN PicSearch
   
Restricted Directories: The path is relative to root and must contain a trailing slash "/"
 
 
 
 
 
 
   



Now, Create 'robots.txt' file at your root directory. Copy above text and paste into the text file.


About Robots.txt Generator

Have you ever wondered how Google, Bing, or Yahoo! obtain websites for its search results? Major search providers offer tools for users to manually submit URLs for indexing, but most of the time, these services automatically discover content with "web crawlers." Search engine crawlers are designed to passively surf the web and crawl websites in order for indexing for their respective search engines. This is one of the primary ways in which websites are indexed for appearing in SERPs (search engine results pages).

Many websites keep a "robots.txt" file in their main directories to give web crawlers specific instructions on how to handle certain content on the sites. This text file is easy to generate, thanks to our automatic tool. In this guide, we'll help you understand how and why to use it.

What is a Robots.txt File?

Simply put, the robots.txt file is a small text document that can be stored on your server. Its only purpose is to communicate with web crawlers about whether or not certain content on your website should be crawled and indexed for search visibility.

Why is a Robots.txt File Important?

A robots.txt file is a powerful component of your website, and it can drastically affect how your site performs in search. For example, if don't know how to design your own robots.txt file or how exceptions/allowances work, you could be inadvertently hurting your search visibility.

After all, robots.txt files have massive sway over the way search engines work. Using a robots.txt file, you can quite literally communicate with web crawlers about which sections of your website should be left alone and aren't important for search.

How Web Crawlers Work

When a web crawler arrives at your website, the first thing it will search for is a robots.txt file. It is designed to automatically "ask" your website about which pages to ignore before crawling. Depending on the contents of the robots.txt file, the bot will enter your website and crawl the entirety of its pages' contents, crawl only parts of your websites, or leave the website altogether.

Factors to Take into Consideration

In relation to the robots.txt file, there are a few things to keep in mind before you get started.

1. Determine if a robots.txt file is present on your website. You can quickly find out whether or not your site is using a robots.txt file by entering your URL followed by /robots.txt. For example, the address would look like, "www.yoursite.com/robots.txt" (without the quotations). If the page loads and you see a lines of text on the left side of the page, you have a robots.txt file in your site's directory. If the page loads, and you see nothing on the screen after a considerable amount of time has passed, you have a robots.txt file, but there is nothing written in it. If the page fails to load, the file does not exist on your server. This is where our Robots.txt Generator can help.

2. If you have a robots.txt file, find out if it's keeping web crawlers from accessing important sections of your site. Inexperienced users will sometimes attempt to manually create their own robots.txt files, and they can inadvertently cause more harm than good by accidentally telling crawlers to block the wrong sections of their website. You can avoid this entirely by using our Robots.txt Generator and cutting out the guesswork.

5. Assess your website's need for a robots.txt file. In the cases of most websites, there are going to generally be certain sections that shouldn't be crawled by search engines. A great example of this would be comment sections where outside users can provide input and engage in discussion on a single website.

There are a number of other situations in which a robots.txt file makes sense. For instance, you might want to temporarily block pages that are under construction, or you want to give further clarification to web crawlers on how to handle paid ads and links on your site. Those are just a couple examples.

You might even find that you don't need a robots.txt file. However, your site must fulfill hefty criteria in order to not require this type of file. For example, you must trust that all content on your website is free of errors. Additionally, your website would theoretically contain no content that you wouldn't want being searched by practically any person who has a connection to the internet. 

Keep in mind that when you forego a robots.txt file, a pretty simple process happens when a search engine crawler arrives at your website. It initially checks for the robots.txt file, finds that it is not present, and proceeds to crawl every section of your website since it has not been given any specific instructions to do otherwise. From there, every single part of your website is now searchable using the crawler's parent search engine.

Generating a Robots.txt File

Of course, you can always make your own robots.txt file, but if you don't understand the coding language used to speak with search engine web crawlers, you might be doing more harm than good. This is how our Robots.txt Generator can help.

We'll provide a step-by-step guide to using the features of this tool and creating a robots.txt file that fits your particular needs.

  • Default - All Robots are: By default, you can configure to take an across-the-board approach and block or allow access to all search engines. If you want to add exceptions, you can do so later in the next few fields. If you aren't sure which option to choose for this feature, think about it like this: If you want all or most search engines to crawl your website, choose the "Allowed" option. If you want all or most search engines to stay away from your website, choose the "Refused" option.
  • Crawl-Delay: This feature allows you to choose the frequency at which a crawler can visit your website. There are multiple things to keep in mind when choosing a value for this field. First of all, each time a crawler visits your website, your content is updated in search engines and your most recent content is reflected. Crawlers are literally always visiting websites to stay updated with content. When you lack a crawl delay, your website essentially tells crawlers through the robots.txt files that it can be visited and crawled as much as possible. While this seems preferable, it can produce server strain from the continual frequent requests. This is why crawl-delay exists. For websites with content that is continually changing at a rapid rate, such as Reddit or Twitter/Facebook, a low crawl delay is crucial. For websites that receive periodic, occasional updates, a longer crawl delay makes sense. If you don't update your website on a daily basis, it's safe to keep the delay as long as possible.
  • Sitemap: If you are weary about crawlers not being able to access the entirety of your website, you can help them along by disclosing a sitemap (if you have one). A sitemap contains a full list of links to every page on your website. A web crawler can use this page to easily gain access to every nook and cranny on your site. If your website isn't equipped with a sitemap, this isn't a big deal. That's why we left this field as optional. They simply help if all the pages of your site aren't already properly linked.
  • Search Robots: This is where you'll really be able to customize the types of search engines you want to turn away or allow. The options you choose for these fields will heavily depend on your initial default option for allowing or refusing all search engines. Think of these fields as exceptions to your default rules. For example, if you want to allow all search engines except Baidu to crawl your website, make sure your default setting at the top of the generator is set to Allowed, and change Baidu's specific setting to "Refused." Additionally, if you want to refuse all search engines except for, say, Google, change your default setting for all search engines to "Refused," and change Google's specfic setting to "Allowed."
  • Restricted Directories: In this area, you'll be able to tell web crawlers to overlook certain areas in your website. For example, if you use a certain directory in your site's director for unrelated file-hosting, you can choose to exclude it in this section. As a rule of thumb, keep in mind that the generator will assume you're working from the perspective from your directory's root (starting point) when describing the other directories to ignore. For example, if the directory you want ignored is located at "/root/tools/ignoreddirectory/," you simply need to only type "/tools/ignoreddirectory/" in the field. Note that you can exclude up to 6 directories. Once you generate your robots.txt file, you can lengthen this number if you wish by copying the style of coding and adding more directories to the exclusions.


How to Save, Upload, and Use your New Robots.txt File

Once your fields are all configured correctly, it's time to generate and use your new robots.txt file. You can do this in one of two ways: you can either log into your web server and create a Robots.txt file straight on the server, or you can make one on your computer. We'll show you how to do both.

Creating Your Robots.txt File on Your Server

Log into your server and navigate to the root directory (the highest point in your filesystem). You should arrive there automatically after choosing to view your file heirarchy. Using the interface, choose to create a new text document. After the document has been created, paste the generated code from this page into it. Do not add, edit, or delete any lines of code without doing your research. Save the file, and rename it to "robots.txt" (with no quotations) before logging out of the server.

Creating Your Robots.txt File on Your Computer

Starting from the desktop, right-click anywhere and choose New > Text Document from the resulting menu. A new icon will be added to the desktop - rename it to "robots.txt" (with no quotations). After renaming it, double-click it to open it. Paste the generated code from this page into the text document. Do not tamper with the code. Click File > Save to bind it to the file.

Log into your web server via browser or FTP client. Ensure that you are currently at the root directory and choose the "Upload" option. For FTP clients, you might see a file manager window on the left side of your screen that allows you to freely browse your computer while also simultaneously seeing your server files on the right side. Navigate to your robots.txt file (on the desktop) and upload it to the root directory of your server. On FTP clients, you can generally drag the file over from your local directory and drop it into your server's root directory. The upload will then begin.

How to Know if Your Robots.Txt File is Working Correctly

You can easily test this file by simply using a search engine! For example, find a page in a directory that you excluded from the crawling process and paste some of its text into a search bar. Try placing quotations around it to help the search engine target the content more easily. If the search engine turns up 0 results, your robots.txt file is doing its job. If you see the page that should be excluded in the search results, you made a mistake along the way somewhere and need to backtrack.