Robby

  Please note that Robby is under development and not available for download yet.

Robby, is a PHP class for caching and parsing robots.txt files.  This was created after a search for robots.txt parsing code in PHP turned up little, and of that, none did caching and everything found failed to correctly parse valid robots.txt files with in-line comments.  

Purpose

Adherence to policy defined in robots.txt is a way for non-end-user HTTP clients to demonstrate their "good behavior".  Ironically, attempting to read robots.txt for every file access would be a demonstration of bad behavior that could get your access to the resources banned.   This class attempts to use an aggressive caching strategy.  The last robots.txt file checked is live cached by the class for up to 20 seconds, then file system and database caches are checked.  Robby will permanently cache robots.txt to either the file system, a MySQLi database, or both (to allow transition from one to the other).

Applications using Robby are responsible for and highly advised to respect any crawl-delay directive in a given site's robots.txt.

Support

You can ask for help with Robby on the mailing list.

License, Documentation:

View the current: README • License (GPL)

Version history:

Requirements, Download(s):

Requires PHP 5.3 or greater.  Can use MySQLi if available, otherwise requires write access to the file system for caching.

Mailing List

Mailing list | Archives

Home