Robots.txt
Definition
Robots.txt is a simple text file placed in a website's root directory that provides instructions to search engine crawlers about which areas of the site should not be processed or scanned. It's part of the Robots Exclusion Protocol (REP), a group of web standards that regulate how robots crawl the web.
Google Search Console includes a robots.txt Tester tool, which allows you to check if your robots.txt file is correctly configured and test how specific URLs are affected by it. Proper use of robots.txt can help you manage your site's crawl budget and prevent sensitive or duplicate content from being indexed.
Related Terms
Crawling
Crawling is the process by which search engines discover and scan web pages, collecting information to index and rank content.
Indexing
Indexing is the process by which search engines add your web pages to their database, making them eligible to appear in search results.
Sitemap
A Sitemap is a file that lists a website's important pages, images, and files, helping search engines understand your site structure and content.