unrobotstxt 0.1.0

Translation of Google's robots exclusion protocol (robots.txt) parser and matcher

To use this package, run the following command in your project's root directory:

Manual usage
Put the following dependency into your project's dependences section:


This is a D translation of Google's robots exclusion protocol (robots.txt) parser and matcher. It's derived from Google's open source project, but not affiliated with Google in any way.


  • Matches Google's (open source) implementation of the robots.txt standard
  • Available as a library or a standalone test tool
  • @safe

Standalone tool

Can be used to test a robots.txt file, to see if it blocks/allows the URLs you expect.

Usage example
$ wget https://dlang.org/robots.txt
$ cat robots.txt 
User-agent: *
Disallow: /phobos-prerelease/
Disallow: /library-prerelease/
Disallow: /cutting-edge/
$ robotstxt robots.txt MyBotName /index.html
user-agent 'MyBotName' with URI '/index.html': ALLOWED
$ robotstxt robots.txt MyBotName /cutting-edge/index.html
user-agent 'MyBotName' with URI '/cutting-edge/index.html': DISALLOWED

Run dub build from repo root. You can put the resulting robotstxt binary in your PATH.

Alternatively, download, build and run from the DUB registry with dub run unrobotstxt.


Usage example
import std;

import unrobotstxt;

void main()
	const robots_txt = readText("robots.txt");
	auto matcher = new RobotsMatcher();
	if (matcher.AllowedByRobots(robots_txt, ["MyBotName"], "/index.html"))
		// Do bot stuff

There's no API for parsing once and then making multiple URL checks.

For pure Google-style parsing (no matching), you can also implement the callbacks in the RobotsParseHandler abstract class and pass it to ParseRobotsTxt.


See the generated docs. The example above is pretty much what you get, though.

The code supports a StrictSpelling version that corresponds to a kAllowFrequentTypos global boolean in the original C++ version. It disables some typo permissiveness (e.g., "Disalow" instead of "Disallow"), but still allows various other quirk permissiveness. Otherwise the API matches the original C++ code.


Bug fixes and misc. improvements are welcome, but make a fork if you want to extend/change the API in ways that don't match the original. I've named this project unrobotstxt to leave the robotstxt name available for a project with a more idiomatic API.

  • Simon Arneaud
0.1.0 2020-Jul-03
~master 2020-Jul-03
Show all 2 versions
Download Stats:
  • 0 downloads today

  • 0 downloads this week

  • 0 downloads this month

  • 7 downloads total

Short URL: