PHP Classes

Crawler: Extract links and images from remote Web pages

Recommend this page to a friend!
  Info   View files View files (4)   DownloadInstall with Composer Download .zip   Reputation   Support forum (7)   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStar 52%Total: 6,443 This week: 1All time: 327 This week: 560Down
Version License PHP version Categories
crawler 1.1Freely Distributable4.0HTML, Web services
Description 

Author

This class can be used to extract links and images from remote Web pages.

It can access Web pages, parse the pages HTML and extract the URLs of the links and the images.

If necessary, the class may access a login page and emulate the submission of a login form to subsequent accesses can be done on behalf of the logged user.

Innovation Award
PHP Programming Innovation award nominee
March 2008
Number 7


Prize: One copy of Delphi for PHP
Retrieving Web pages from remote sites is a relatively easy task in PHP.

If you want to crawl a site to search for something in its pages, you only need to retrieve the site pages, use some regular expressions to extract the site links, and retrieve the linked pages until all pages were followed.

However, if some pages can only be accessed by authenticated users, the problem is no longer so simple.

This package provides a more complete solution to the problem of crawling site pages by automatically authenticating, so it can access all pages restricted to logged users.

Manuel Lemos
Picture of Md. Shaiful islam
Name: Md. Shaiful islam is available for providing paid consulting. Contact Md. Shaiful islam .
Classes: 1 package by
Country: United States United States
Age: 40
All time rank: 43153 in United States United States
Week rank: 416 Up46 in United States United States Up
Innovation award
Innovation award
Nominee: 1x

  Files folder image Files  
File Role Description
Plain text file Crawler.php Class The Class
Accessible without login Plain text file ExampleCrawlImage.php Example Crawl Image form http://www.phpclasses.org/ site
Accessible without login Plain text file ExampleCrawlLink.php Example Crawl links form http://www.phpclasses.org/ site
Accessible without login Plain text file ExampleLoginCrawlLink.php Example Login and CrawlLink from a site

 Version Control Unique User Downloads Download Rankings  
 0%
Total:6,443
This week:1
All time:327
This week:560Down
User Ratings User Comments (3)
 All time
Utility:75%StarStarStarStar
Consistency:69%StarStarStarStar
Documentation:-
Examples:76%StarStarStarStar
Tests:-
Videos:-
Overall:52%StarStarStar
Rank:2405
 
exellent!
2 years ago (Jeff Dudas)
70%StarStarStarStar
Does not work for linked in
11 years ago (Mansoor Rana)
12%Star
Lacking recursion, it doesn't actually crawl.
15 years ago (wahoo frankinson)
32%StarStar