Enter a website URL (e.g., https://ddginc-usa.com
or just ddginc-usa.com
):
The robots.txt
file provides instructions to web crawlers about which parts of a site should not be accessed or crawled.
It is publicly accessible at the root of a domain (e.g., https://ddginc-usa.com/robots.txt
).
Common directives:
User-agent: *
— applies rules to all crawlersDisallow: /
— blocks all accessAllow: /path/
— specifically allows accessSitemap: https://ddginc-usa.com/sitemap.xml
— points bots to your sitemapCrawl-delay: 10
— requests bots to wait N seconds between requestsGooglebot
Googlebot-Image
Googlebot-News
Googlebot-Video
bingbot
BingPreview
Slurp
DuckDuckBot
Baiduspider
Yandex
Sogou web spider
Exabot
MojeekBot
Qwantify
facebookexternalhit
Twitterbot
LinkedInBot
Pinterestbot
AhrefsBot
SemrushBot
MJ12bot
DotBot
BLEXBot
ia_archiver
Screaming Frog SEO Spider
Googlebot-Mobile
Google Page Speed Insights
W3C_Validator