Anders Ardö > Combine-3.8 > combineCtrl

Download:
Combine-3.8.tar.gz

Annotate this POD

CPAN RT

Open  0
Report a bug
Source   Latest Release: Combine-4.003

NAME ^

combineCtrl - controls a Combine crawling job

SYNOPSIS ^

combineCtrl <action> --jobname <name>

where action can be one of start, kill, load, recyclelinks, reharvest, stat, howmany, records, hosts, initMemoryTables, open, stop, pause, continue

OPTIONS AND ARGUMENTS ^

jobname is used to find the appropriate configuration (mandatory)

Actions starting/killing crawlers

start

takes an optional switch --harvesters n where n is the number of crawler processes to start

kill

kills all active crawlers (and their associated combineRun monitors) for jobname

Actions loading or recycling URLs for crawling

load

Read a list of URLs from STDIN (one per line) and schedules them for crawling

recyclelinks

Schedule all newly found (since last invocation of recyclelinks) links in crawled pages for crawling

reharvest

Schedules all pages in the database for crawling again (in order to check if they have changed)

Actions for controlling scheduling of URLs

open

opens database for URL scheduling (maybe after a stop)

stop

stops URL scheduling

pause

pauses URL scheduling

continue

continues URL scheduling after a pause

Misc actions

stat

prints out rudimentary status of the ready queue (ie eligible now) of URLs to be crawled

howmany

prints out rudimentary status of all URLs to be crawled

records

prints out the number of ercords in the SQL database

hosts

prints out rudimentary status of all hosts that have URLs to be crawled

initMemoryTables

initializes the administrative MySQL tables that are kept in memory

DESCRIPTION ^

Implements various control functionality to administer a crawling job, like starting and stoping crawlers, injecting URLs into the crawl queue, scheduling newly found links for crawling, controlling scheduling, etc.

This is the preferred way of controling a crawl job.

EXAMPLES ^

echo 'http://www.yourdomain.com/' | combineCtrl load --jobname aatest

Seed the crawling job aatest with a URL

combineCtrl start --jobname aatest --harvesters 3

Start 3 crawling processes for job aatest

combineCtrl recyclelinks --jobname aatest

Schedule all new links crawling

combineCtrl stat --jobname aatest

See how many URLs that are eligible for crawling right now.

SEE ALSO ^

combine

Combine configuration documentation in /usr/share/doc/combine/.

AUTHOR ^

Anders Ardö, <anders.ardo@it.lth.se>

COPYRIGHT AND LICENSE ^

Copyright (C) 2005 Anders Ardö

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

See the file LICENCE included in the distribution at http://combine.it.lth.se/