Spider website
Package:
Summary:
Crawl a site and retrieve the the URL of all links
Groups:
Author:
Description:
This class can be used to crawl a site and retrieve the the URL of all links.
It can retrieve a page of a site and follow all links recursively to retrieve all the site URLs.
The class can restrict the crawling to URLs with a given extension and avoids accessing pages listed in the site robots.txt file, or pages set with the no index or no follow meta tags.
It can retrieve a page of a site and follow all links recursively to retrieve all the site URLs.
The class can restrict the crawling to URLs with a given extension and avoids accessing pages listed in the site robots.txt file, or pages set with the no index or no follow meta tags.
Comments are closed.