# +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ # greenit.danielaknoll.de robots.txt: Version 2025 # # Use meta tags if you don't like that images will be crawled # # Sitemap location at the very bottom # # Schlechte Bots lesen die robots.txt, um herauszufinden, wo sie nicht rein dürfen. # Besser ist es, eine Positivliste (User-Agent: ... Allow: ...) # zu erstellen und alle anderen zu verbieten User-Agent: * Disallow: * # * = alle Zeichen; $ Zeilenende # # Einige Bots # Googlebot # Bingbot # Yandex Bot # Apple Bot # Facebook External Hit # DuckDuck Bot # Baidu Spider # Sogou Spider # Swiftbot # Slurp Bot # CCBot # # INHALTSVERZEICHNIS # - Positivliste # - Bots, die ganze Seiten kopieren (irgendwo im Netz gefunden)) # - KI-Bots # # - Blockiere alle anderen, nachdem vorher eine Positivliste erstellt wurde # - Sitemap # - Crawl delay # # +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ # Ganz unten gibt es ein Disallow the rest ###### ALLOW ###### ###### SITEMAP ###### # Inform all sites where the sitemap is. This will be done automatically (updated 2024-06) Sitemap: https://greenit.danielaknoll.de/sitemap.xml # Crawling Verzögerung # ------------------------------------------------------------ # Crawl-delay: 26 # Disallow images and allow css crawling # ----------------------------------------------------------- # User-agent: * Disallow: /images/ Allow: /css/ # Positivliste - crawlen erlaubt # ----------------------------------------------------------- # # # Bing User-agent: BingPreview Allow: / Crawl-delay: 5 # Bing User-agent: Bingbot Allow: / Crawl-delay: 5 # Apple User-agent: Applebot Allow: / Crawl-delay: 20 # DuckDuckGo User-agent: DuckDuckBot Allow: / Crawl-delay: 10 # Google User-agent: Googlebot Allow: / Crawl-delay: 15 User-agent: Googlebot-News Allow: / Crawl-delay: 10 # Slurp (Yahoo)) User-agent: Slurp Allow: / Crawl-delay: 1 # Professional SEO Tools # cognitiveSEO User-agent: cognitiveSEO Allow: / # Lumar User-agent: Lumar Allow: / # MJ12bot (Backlink crawler) User-agent: MJ12bot Allow: / # Screaming Frog SEO Spider User-agent: Screaming Frog SEO Spider Allow: / ###### DISALLOW ###### # curl/wget, added 2017-10 User-agent: curl Disallow: / User-agent: wget Disallow: / User-agent: Scrapy Disallow: / # Added 2018-04: # User-agent: researchscan.comsys.rwth-aachen.de User-agent: Researchscan* Disallow: / ###### KI bots, updated 2024-13 # Got inspired by the following websites which use robots.txt: # www.seo-suedwest.de/8890-zahlreiche-websites-sperren-chatgpt-bereits-per-robots-txt-aus.html # stackoverflow.com/robots.txt # www.sueddeutsche.de/robots.txt # Exclude SEO-Tools & SPAM-Bots - updates 2024-12-13 User-agent: 008 User-agent: AhrefsBot # AI User-agent: Amazonbot User-agent: anthropic-ai # AI User-agent: Applebot-Extended User-agent: AwarioRssBot User-agent: AwarioSmartBot User-agent: backlink-check.de User-agent: BacklinkCrawler User-agent: BLP_bbot/0.1 User-agent: Bytespider User-agent: CCBot User-agent: ChatGPT-User # AI User-agent: ClaudeBot # AI User-agent: Claude-Web # AI User-agent: cohere-ai # AI User-agent: DataForSeoBot User-agent: Diffbot User-agent: ExtractorPro User-agent: FacebookBot User-agent: Fasterfox User-agent: FriendlyCrawler User-agent: Googlebot-Image # AI User-agent: Google-Extended # Googles generative AI crawlers User-agent: GPTBot # AI User-agent: GumGum Bot # AI User-agent: ImagesiftBot # AI User-agent: LinkextractorPro User-agent: LinkWalker User-agent: magpie-crawler User-agent: Mediapartners-Google* User-agent: MegaIndex.ru/2.0 User-agent: Meta-ExternalAgent User-agent: meta-externalagent User-agent: NewsNow User-agent: news-please User-agent: OAI-SearchBot User-agent: omgilibot User-agent: Openbot User-agent: peer39_crawler* User-Agent: PerplexityBot # AI User-agent: Quora-Bot User-agent: rogerbot User-agent: searchpreview User-agent: SemrushBot User-agent: SEODAT User-agent: SEOENGBot User-agent: SEOkicks-Robot User-agent: Twitterbot User-agent: True_Robot User-agent: TurnitinBot User-agent: um-IC # Uber Metrics User-agent: URL Control User-agent: URL_Spider_Pro User-agent: voltron User-agent: xovi User-agent: Yahoo Pipes 1.0 Disallow: / # Testweise deaktiviert - siehe schreiben.* wo es aktiv ist. # Disallow the rest # ----------------------------------------------------------- # # User-agent: * # Blockiere alle Web-Roboter # Disallow: / # Blockiere das root-Verzeichnis ###### LEGAL NOTICE ###### # Legal notice (got inspired by sueddeutsche.de for this legal text): daniela-knoll.de expressly reserves # the right to use its content for commercial text and data mining (§ 44 b UrhG). # The use of robots or other automated means to access daniela-knoll.de or collect or mine data without # the express permission of daniela-knoll.de is strictly prohibited. # daniela-knoll.de may, in its discretion, permit certain automated access to certain daniela-knoll.de pages, # If you would like to apply for permission to crawl daniela-knoll.de, collect or use data, please email info@daniela-knoll.de