Please note that Robby is under development and not available for download yet.
Robby, is a PHP class for caching and parsing robots.txt
files. This was created after a search for robots.txt
parsing code in PHP turned up little, and of that, none did caching and
everything found failed to correctly parse valid robots.txt files with
in-line comments.
Adherence to policy defined in robots.txt
is a way for
non-end-user HTTP clients to demonstrate their "good behavior".
Ironically, attempting to read robots.txt
for every file
access would be a demonstration of bad behavior that could get your access
to the resources banned. This class attempts to use an
aggressive caching strategy. The last robots.txt file checked is
live cached by the class for up to 20 seconds, then file system and
database caches are checked. Robby will permanently cache robots.txt
to either the file system, a MySQLi database, or both (to allow transition
from one to the other).
Applications using Robby are responsible for and highly
advised to respect any crawl-delay directive in a given
site's robots.txt
.
You can ask
for help with Robby on the mailing list.
View the current: README • License
(GPL)
Requires PHP 5.3 or greater. Can use MySQLi if available, otherwise requires write access to the file system for caching.