Spider website

Package:
Spider website
Summary:
Crawl a site and retrieve the the URL of all links
Groups:
HTML, PHP 5, Searching
Author:
Karol Janyst
Description:
This class can be used to crawl a site and retrieve the the URL of all links.

It can retrieve a page of a site and follow all links recursively to retrieve all the site URLs.

The class can restrict the crawling to URLs with a given extension and avoids accessing pages listed in the site robots.txt file, or pages set with the no index or no follow meta tags.


Powered by Gewgley