What is robots.txt?
Robots.txt – This is a simple text file that contains directives for search engines. It tells them which pages and files should be indexed and which should not. This helps manage your site's SEO by preventing unwanted or sensitive data from being indexed.
Be careful when making changes to Robots.txt: an incorrect directive may block the sections of the site you need from indexing.
Search engines regularly check sites' robots.txt file to see if it contains instructions. If a file is missing or does not have applicable directives, search engines will traverse the entire site.
How to create a robots.txt file?
Creating a robots.txt file is simple: open a text editor (notepad, notepad+, vs code) and save the file with the name "robots.txt". Then upload it to your site's root directory so search engines can find it.
The file should be available at: https://yoursite.ru/robots.txt span>
Syntax robots.txt
The syntax of robots.txt is simple: each rule must begin with a "User-agent" directive, followed by a "Disallow" directive or "Allow" and the path to the page or file that needs to be blocked or allowed.
Example rules for robots.txt:
User-agent: *
Disallow: /private/
Allow: /public/
In this case, all crawlers cannot index pages in the "private" directory, but can index pages in the "public" directory.
Symbol meanings:
* is any sequence of characters.
$ is the end of the line.
# - comments.
Let's take a closer look at all the directives for robots.txt
User-agent
Indicates a robot for which the rules listed in robots.txt apply.
Disallow
Prohibits crawling sections or individual pages of the site.
Allow
Allows indexing of sections or individual pages of the site.
Sitemap
Specifies the path to the Sitemap file located on the site.
Clean-param
Indicates to the robot that the page URL contains parameters (for example, UTM tags) that do not need to be taken into account when indexing.
Crawl-delay
Sets the minimum time period (in seconds) for the robot between finishing loading one page and starting loading the next.
Yandex requirements for the Robots.txt file
- The file size should not exceed 500 KB.
-
The file must be in TXT format and named robots.txt.
-
The file must be placed in the root directory of the site.
-
The file must be accessible to robots. The server on which the site is located should return an HTTP code with a 200 OK status
Errors in robots.txt and their consequences
Blocking important pages
Errors in robots.txt can lead to important pages on your site being blocked, reducing your visibility in search engines. Make sure you check the rules in the file carefully to prevent such problems.
Conflicting rules
Conflicting rules in robots.txt can confuse robots, which will affect the indexing of your site. Make sure the rules in your robots.txt file are clear and consistent.
Skip rules for crawlers
If you haven't specified rules for a particular crawler, it may not process your site correctly, which can lead to indexing problems. Make sure all important crawlers have appropriate rules in their robots.txt file.
You should not block JavaScript or CSS files using robots.txt. Bots may not display content correctly on your site if they don't have access to these resources.
How to check and fix errors in robots.txt
To check the robots.txt file for errors, you can use various tools, such as Google Search Console or Yandex.Webmaster. These tools provide information about the rules in your robots.txt file and help you determine if there are any problems.
Robots.txt file analysis tool from Yandex
https://webmaster.yandex.ru/tools/robotstxt/
Google's robots.txt file verification tool
https://support.google.com/webmasters/answer/6062598
Cyrillic characters in robots.txt
The use of Cyrillic characters is prohibited in the robots.txt file. To convert a Cyrillic domain, use the Punycode converter.

To convert Cyrillic names, use Unicode converter

Wrong
User-agent: *
Disallow: /directory/
Sitemap: https://yoursite.rf/sitemap.xml
Right
User-agent: *
Disallow: /%D0%BA%D0%B0%D1%82%D0%B0%D0%BB%D0%BE%D0%B3/
Sitemap: https://xn--80aae4a1bi2b.xn--p1ai/sitemap.xml
Manuals on robots.txt from Yandex and Google
Yandex documentation on the robots.txt file
https://yandex.ru/support/webmaster/controlling-robot/robots-txt .html
Google documentation on the robots.txt file
https://developers.google.com/search/docs/crawling -indexing/robots/intro?hl=ru
Example robots.txt to ban an entire site
User-agent: *
Disallow: /
Example robots.txt for Wordpress
User-agent: *
Disallow: /cgi-bin
Disallow: */?
Disallow: /wp-
Disallow: *?s=
Disallow: *&s=
Disallow: /search
Disallow: /author/
Disallow: *?attachment_id=
Disallow: */trackback
Disallow: */feed
Disallow: */embed
Disallow: */page/
Allow: /wp-content/plugins/
Allow: /wp-content/themes/
Allow: /wp-content/cache/
Allow: /wp-includes/
Allow: */uploads
Allow: /*/*.js
Allow: /*/*.css
Allow: /wp-*.png
Allow: /wp-*.jpg
Allow: /wp-*.jpeg
Allow: /wp-*.gif
Sitemap: https://example.ru/sitemap.xml
Example robots.txt for Bitrix
User-Agent: *
Disallow: */index.php$
Disallow: /bitrix/
Disallow: /personal/
Disallow: */cgi-bin/
Disallow: /local/
Disallow: /test/
Disallow: /*show_include_exec_time=
Disallow: /*show_page_exec_time=
Disallow: /*show_sql_stat=
Disallow: /*bitrix_include_areas=
Disallow: /*clear_cache=
Disallow: /*clear_cache_session=
Disallow: /*ADD_TO_COMPARE_LIST
Disallow: /*ORDER_BY
Disallow: /*?print=
Disallow: /*?list_style=
Disallow: /*?sort=
Disallow: /*sort_by=
Disallow: /*?set_filter=
Disallow: /*?arrFilter=
Disallow: /*?order=
Disallow: /*&print=
Disallow: /*print_course=
Disallow: /*?action=
Disallow: /*&action=
Disallow: /*register=
Disallow: /*forgot_password=
Disallow: /*change_password=
Disallow: /*login=
Disallow: /*logout=
Disallow: /*auth=
Disallow: */auth/
Disallow: /*backurl=
Disallow: /*back_url=
Disallow: /*BACKURL=
Disallow: /*BACK_URL=
Disallow: /*back_url_admin*
Disallow: /*?utm_source=
Disallow: */order/
Disallow: /*download
Disallow: /test.php
Disallow: */filter/*/apply/
Disallow: /*setreg=
Disallow: /*logout
Disallow: */filter/
Disallow: /*sphrase_id
Disallow: */search/
Disallow: /*type=
Disallow: /*?product_id=
Disallow: /*?display=
Disallow: /*?view_mode=
Disallow: /*view=
Disallow: /*min_price=
Disallow: /*max_price=
Disallow: /*&page=
Disallow: /*?path=
Disallow: /*?route=
Disallow: /*?products_on_page=
Disallow: /*?PAGEN_1=1$
Disallow: /*?PAGEN_1=1/$
Disallow: /*?new=
Disallow: /*?edit=
Disallow: /*?preview=
Disallow: /*SHOWALL=
Disallow: /*SHOW_ALL=
Disallow: /*SHOWBY=
Disallow: /*SPHRASE_ID=
Disallow: /*TYPE=
Disallow: /*?utm*=
Disallow: /*&utm*=
Disallow: /*?VIEW=
Disallow: /*?SORT_TO=
Disallow: /*?SORT_FIELD=
Disallow: /*set_filter=
Disallow: */auth.php
Disallow: /*?alfaction=
Disallow: /*?oid=
Disallow: /*?name=
Disallow: /*?form_id=
Disallow: /*&form_id=
Disallow: /*?bxajaxid=
Disallow: /*&bxajaxid=
Disallow: /*?view_result=
Disallow: /*&view_result=
Disallow: */resize_cache/
Disallow: /*?linerow=
Disallow: /bitrix/panel/
Disallow: *?sort_ord=
Disallow: *?sort_dir=
Disallow: *?category_id=
Disallow: *?item_id=
Disallow: *?pn_pr=
Disallow: *?page=
Disallow: *?tab=
Disallow: *?display=
Disallow: *?linerow=
Disallow: *?year=
Disallow: *?oid=
Disallow: */filter/
Disallow: *showElements*
Disallow: *PAGEN_2*
Disallow: *?ORDER_ID=
Disallow: *how=*
Disallow: */form/?name=
Disallow: *?name=
Disallow: /*gclid*
Disallow: /*yclid*
Disallow: /*ymclid*
Disallow: /test*
Disallow: /404.php
Disallow: /api/*
Disallow: /*?RID*
Disallow: *?preview=
Disallow: *bitrix_*=
Disallow: *auth=
Disallow: /*?tag
Disallow: /*set_filter*
Disallow: /*?showElements=
Disallow: /*?tid*
Disallow: /*&tid*
Disallow: *?FILTER*=
Disallow: *?ei=
Disallow: *?p=
Disallow: *?q=
Disallow: *?tags=
Disallow: *B_ORDER=
Disallow: *BRAND=
Disallow: *CLEAR_CACHE=
Disallow: *SECTION_ID=
Disallow: *section_id=
Disallow: *SECTION[*]=
Disallow: *SHOW_ALL=
Disallow: *SHOWBY=
Disallow: *SORT=
Disallow: *SPHRASE_ID=
Disallow: *TYPE=
Disallow: /*?from*
Disallow: /*&from*
Disallow: /*block=*
Disallow: *r1=
Disallow: */?_ym_debug
Disallow: */apply/*
Disallow: *&by*
Disallow: *?by*
Disallow: *?id=*
Disallow: *?a=*
Disallow: *?amp*
Disallow: *IBLOCK_ID=*
Disallow: *RESULT_ID=*
Disallow: *PROPERTY=*
Disallow: *IN_STOCK=*
Disallow: *SECTION_CODE=*
Disallow: *SIZE=*
Disallow: *added=*
Disallow: *position=*
Disallow: *callibri=*
Disallow: *gtm_debug=*
Disallow: *placement=*
Disallow: *source=*
Disallow: *&adv=*
Disallow: *?adv=*
Disallow: *option=*
Disallow: *?hhtmFrom=*
Disallow: *?_r=*
Disallow: *sort_order=*
Allow: /upload/*
Allow: /bitrix/components/
Allow: /bitrix/cache/
Allow: /bitrix/js/
Allow: /bitrix/templates/
Allow: /bitrix/*.js
Allow: /bitrix/*.css
Allow: /local/components/
Allow: /local/cache/
Allow: /local/js/
Allow: /local/templates/
Allow: /local/*.js
Allow: /local/*.css
Allow: /local/*.jpg
Allow: /local/*.jpeg
Allow: /local/*.png
Allow: /local/*.gif
Sitemap: https://example.ru/sitemap.xml
Robots.txt example for ModX
User-agent: *
Disallow: /cgi-bin
Disallow: /manager/
Disallow: /assets/
Disallow: /core/
Disallow: /connectors/
Disallow: /index.php
Disallow: *?
Allow: /assets/*.jpg
Allow: /assets/*.jpeg
Allow: /assets/*.gif
Allow: /assets/*.png
Allow: /assets/*.pdf
Allow: /assets/*.js
Allow: /assets/*.css
Allow: /assets/*.svg
Sitemap: https://example.ru/sitemap.xml
Example robots.txt for OpenCart
User-agent: *
Disallow: /*route=account/
Disallow: /*route=affiliate/
Disallow: /*route=checkout/
Disallow: /*route=product/search
Disallow: /index.php
Disallow: /admin
Disallow: /catalog
Disallow: /download
Disallow: /export
Disallow: /system
Disallow: /*?sort=
Disallow: /*&sort=
Disallow: /*?order=
Disallow: /*&order=
Disallow: /*?limit=
Disallow: /*&limit=
Disallow: /*?filter_name=
Disallow: /*&filter_name=
Disallow: /*?filter_sub_category=
Disallow: /*&filter_sub_category=
Disallow: /*?filter_description=
Disallow: /*&filter_description=
Disallow: /*?tracking=
Disallow: /*&tracking=
Disallow: /*?page=
Disallow: /*&page=
Disallow: /wishlist
Disallow: /login
Sitemap: http://example.ru/sitemap.xml
Robots.txt example for Joomla
User-agent: *
Disallow: /administrator/
Disallow: /bin/
Disallow: /cache/
Disallow: /cli/
Disallow: /components/
Disallow: /includes/
Disallow: /installation/
Disallow: /language/
Disallow: /layouts/
Disallow: /libraries/
Disallow: /logs/
Disallow: /media/
Disallow: /modules/
Disallow: /plugins/
Disallow: /templates/
Disallow: /tmp/
Disallow: /index.php* # Only if you have SEF enabled
Allow: /index.php?option=com_xmap&sitemap=1&view=xml
Sitemap: http://example.ru/sitemap.xml
Robots.txt example for Drupal
User-agent: *
# CSS, JS, Images
Allow: /core/*.css$
Allow: /core/*.css?
Allow: /core/*.js$
Allow: /core/*.js?
Allow: /core/*.gif
Allow: /core/*.jpg
Allow: /core/*.jpeg
Allow: /core/*.png
Allow: /core/*.svg
Allow: /profiles/*.css$
Allow: /profiles/*.css?
Allow: /profiles/*.js$
Allow: /profiles/*.js?
Allow: /profiles/*.gif
Allow: /profiles/*.jpg
Allow: /profiles/*.jpeg
Allow: /profiles/*.png
Allow: /profiles/*.svg
# Directories
Disallow: /core/
Disallow: /profiles/
# Files
Disallow: /README.txt
Disallow: /web.config
# Paths (clean URLs)
Disallow: /admin/
Disallow: /comment/reply/
Disallow: /filter/tips
Disallow: /node/add/
Disallow: /search/
Disallow: /user/register
Disallow: /user/password
Disallow: /user/login
Disallow: /user/logout
# Paths (no clean URLs)
Disallow: /index.php/admin/
Disallow: /index.php/comment/reply/
Disallow: /index.php/filter/tips
Disallow: /index.php/node/add/
Disallow: /index.php/search/
Disallow: /index.php/user/password
Disallow: /index.php/user/register
Disallow: /index.php/user/login
Disallow: /index.php/user/logout
Disallow: /drupal-9-migration
Disallow: /drupal-migration-services
Disallow: /drupal-7-end-of-life
Disallow: /drupal-migration-rescue
Sitemap: http://example.ru/sitemap.xml
Robots.txt example for Magento
User-agent: *
Disallow: /catalogsearch/
Disallow: /search/
Disallow: /customer/account/login/
Disallow: /*?SID=
Disallow: /*?PHPSESSID=
Disallow: /*?price=
Disallow: /*&price=
Disallow: /*?color=
Disallow: /*&color=
Disallow: /*?material=
Disallow: /*&material=
Disallow: /*?size=
Disallow: /*&size=
Sitemap: http://example.ru/sitemap.xml