dezento / crawlify
Fast Concurrent Crawler
1.0
2021-05-31 19:28 UTC
Requires
- php: ^8.0
- dezento/effective-url-middleware: ^1.0
- guzzlehttp/guzzle: ^7.3
- illuminate/collections: ^8.38
- symfony/css-selector: ^5.2
- symfony/dom-crawler: ^5.2
Requires (Dev)
- symfony/var-dumper: ^5.2
README
Installation
composer require dezento/crawlify
Overview
Crawlify is a lightweight crawler for manipulating HTML,XML and JSON using DomCrawler.
It uses GuzzleHttp\Pool to make concurrent request and that means you can use all Request Options available.
The result it gives back is wrapped with Laravel Collections.
Examples
CRAWL JSON
use Dezento\Crawlify;
$links = [];
for ($i = 1; $i <= 100; $i++) {
$links[] = 'https://jsonplaceholder.typicode.com/posts/' . $i ;
}
$json = (new Crawlify(collect($links))) // you can pass Array or Collection of links
->settings([
'type' => 'JSON' //this is Crawlify Option
])
->fetch()
->get('fulfilled')
->map(fn ($p) => collect(json_decode($p->response)))
->dd();
CRAWL XML
For traversing XML refer to DomCrawler documentation.
$xml = (new Crawlify([
'https://www.nytimes.com/svc/collections/v1/publish/https://www.nytimes.com/section/world/rss.xml',
]))
->fetch()
->get('fulfilled')
->map(fn ($item) =>
collect($item->response->filter('item')->children())
->map(fn ($data) => $data->textContent)
)->dd();
CRAWL HTML
For traversing HTML refer to DomCrawler documentation.
$html = (new Crawlify([
'https://en.wikipedia.org/wiki/Category:Lists_of_spider_species_by_family'
]))
->settings([
#'proxy' => 'http://username:password@192.168.16.1:10',
'concurrency' => 5,
'delay' => 0
])
->fetch()
->get('fulfilled')
->map(fn ($item) =>
collect($item->response->filter('a')->links())
->map(fn($el) => $el->getUri())
)
->reject(fn($a) => $a->isEmpty())
->dd();
OPTIONS
->settings([
'proxy' => 'http://username:password@192.168.16.1:10',
'concurrency' => 5,
'delay' => 0,
....
])
For options you can refer to Request Options documentation.
The only Crawlify custom options is 'type' => 'JSON'
Note
Before using dd()
helper you must install it.
composer require symfony/var-dumper